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The  objective  of  this  study  was  the  identification  and  extension  of  analytical 
techniques  for  the  accurate  prediction  of  genetic  gain  from  selection  on  binary  threshold 
traits.  First,  four  techniques  of  prediction  were  compared  through  computer  simulation  for 
their  ability  to  predict  genetic  gain  close  to  that  realized  in  a cycle  of  mass  selection.  For 
traits  of  low  heritability  (h^  < 0.3)  all  methods  predicted  gain  close  to  that  realized  by 
selection  across  all  mean  incidence  levels  (5  to  95%),  except  the  method  by  Wolfinger  and 
O'Connell  that  overestimated  the  gain  at  incidences  above  75%.  For  traits  of  moderate  to 
high  heritability  some  methods  overestimated  other  underestimated  the  gain. 

The  second  stage  of  this  study  was  the  development  of  gain  prediction  equations  in 
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the  presence  of  linkage  disequilibrium  in  order  to  quantify  its  impact  on  the  genetic  gain  of 
binary  threshold  traits.  It  is  shown  that  linkage  disequilibrium  affects  both  the  mean  and  the 
variance  of  the  distribution  of  the  underlying  continuous  trait,  affecting  the  gain  on  the 
binary  trait.  The  effect  of  disequilibrium  on  the  variance  of  the  distribution  can  contribute 
as  much  as  39%  of  the  total  genetic  gain.  Contrary  to  continuous  traits,  for  threshold  traits 
the  additive  variance  and  the  heritability  reduce  quickly  in  the  first  few  generations  of 
selection,  but  soon  recover  their  initial  values.  Also  in  opposition  to  continuous  traits  where 
the  effect  of  linkage  on  genetic  gain  occurs  only  after  the  first  generation  of  selection,  this 
effect  occurs  in  the  first  generation  for  binary  threshold  traits. 

The  third  stage  of  this  study  considers  the  prediction  of  genetic  gain  from  selection 
when  data  from  environments  with  different  incidences  have  to  be  analyzed  together.  Several 
techniques  of  prediction  were  compared.  Selection  on  univariate  BLUP  predictions  resulted 
in  the  highest  genetic  gain.  Selection  on  predictions  by  generalized  linear  mixed  model 
resulted  in  slightly  smaller  realized  gain,  but  the  predictions  were  more  accurate  than  those 
from  univariate  BLUP. 

Finally,  as  part  of  this  study,  two  computer  programs  were  developed  that  simulate 
a complete  cycle  of  mass  selection  on  binary  traits,  allowing  comparison  of  results  of 
methods  of  estimation  or  prediction  with  those  realized  by  actual  selection. 
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CHAPTER  1 
INTRODUCTION 


Increasing  genetic  merit  of  populations  through  selection  is  the  major  objective  of 
plant  and  animal  breeding.  In  order  to  maximize  the  increase  in  genetic  merit  of  a population 
through  selection,  accurate  and  precise  predictions  of  the  genetic  merit  of  the  candidates  for 
selection  are  required.  In  fact,  the  prediction  of  genetic  merit  is  among  the  most  important 
contributions  of  quantitative  genetics  to  both  plant  and  animal  breeding  (Hallauer  and 
Miranda  1988). 

Most  commonly  the  traits  used  in  breeding,  and  for  which  the  genetic  merit  of 
candidates  is  required,  are  continuous.  In  this  situation,  methods  based  on  linear  prediction 
developed  mainly  by  Henderson  (1984)  have  been  the  methods  of  choice  in  both  plants 
(White  and  Hodge  1 989)  and  animals  (Mrode  1 996).  However,  many  other  equally  important 
traits  in  breeding  are  measured  as  binary  threshold  traits,  where  individuals  are  scored  in  one 
of  two  categories  (e.g.,  dead/alive,  infected/uninfected),  but  are  imderlain  by  a continuous 
trait.  Examples  of  traits  that  have  been  modeled  as  threshold  traits  are  plant  condition  in  trees 
(Ericsson  and  Danell  1 995);  degree  of  calving  difficulty,  disease  resistance,  conformation 
traits,  fertility,  litter  size,  and  survival  in  animals  (Gianola  1982);  and  pupal  color,  shell 
shape,  wing  dimorphism  and  diapause  in  insects  (Roff  1 994b). 
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In  trees  another  important  trait  that  could  be  modeled  as  a binary  threshold  trait  is 
resistance  to  fusiform  rust.  Resistance  to  fusiform  rust,  although  expressed  as  a continuous 
variable  (different  numbers  and  sizes  of  galls,  and  affected  by  many  environmental  factors), 
is  very  time  consuming  to  evaluate  on  this  scale;  therefore,  in  most  breeding  programs  this 
disease  is  evaluated  as  a binary  trait  (presence/  absence  of  galls).  However,  we  do  not  know 
for  sure  how  applicable  the  threshold  model  is  to  these  traits,  since  for  all  of  them  the 
inheritance  of  either  the  underlying  or  the  observable  trait  is  unknown. 

Prediction  of  genetic  merit  or  gain  for  binary  traits  generally  is  not  an  easy  task, 
because  of  the  many  peculiarities  of  these  traits.  First,  binary  traits  violate  the  major 
assumptions  of  the  standard  analytical  techniques  used  in  quantitative  genetics,  including 
those  for  genetic  gain  prediction  (Gianola  1982).  Second,  parameters  estimated  for  binary 
traits  are  usually  dependent  on  the  incidence  of  the  trait  in  the  population  (Dempster  and 
Lemer  1950;  Gianola  1982).  Third,  selection  intensity  is  limited  by  the  incidence  of  the  trait 
(Falconer  1 989).  Fourth,  genetic  gain  for  binary  traits  is  given  by  the  difference  in  incidences 
(or  probabilities  of  occurrence)  of  a given  phenotype  in  consecutive  generations,  rather  than 
differences  in  means  as  for  continuous  traits.  Therefore,  gain  is  dependent  on  factors 
affecting  both  the  mean  and  the  variance  of  the  distribution  of  the  underlying  trait. 

Many  approaches  have  been  suggested  to  estimate  genetic  parameters  and  predict 
genetic  gain  from  selection  on  binary  traits,  including:  (1)  those  using  standard  techniques 
applied  directly  to  binary  data  as  if  they  were  continuous,  followed  (Dempster  and  Lemer 
1 950;  Gianola  1 982)  or  not  (Dieters  et  al.  1 996)  by  transformation  of  the  estimates  obtained; 
(2)  data  transformation  followed  by  application  of  a standard  technique  of  analysis  for 
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continuous  data  (McGuirk  1989);  and  (3)  more  recently,  techniques  based  on  generalized 
linear  mixed  models  (Schall  1991;  Wolfinger  and  O'Connell  1993).  However,  few  studies 
have  compared  these  techniques  (Meijering  and  Gianola  1985;  Mantysaari  et  al.  1991). 
Usually  no  connection  with  realized  gain  on  a binary  observable  scale  is  set  forth  in  these 
studies,  even  knowing  the  major  interest  of  the  breeder  is  in  genetic  gain  on  this  observable 
scale;  it  is  the  binary  scale  where  progress  is  made  and  it  is  the  only  observable  scale.  Also, 
prediction  of  gain  based  on  parameters  estimated  directly  from  0/1  data,  although  used  in 
forestry  ( Rockwood  and  Goddard  1973;  Sohn  and  Goddard  1979;  Dieters  et  al.  1996)  has 
not  been  adequately  evaluated.  Chapter  2 addresses  this  gap,  evaluating  several  analytical 
techniques  of  genetic  gain  prediction  ranging  from  those  based  on  the  use  of  binary  data  as 
if  they  were  continuous  to  techniques  based  on  generalized  mixed  models.  All  techniques  are 
evaluated  for  their  ability  to  predict  accurately  the  genetic  gain  realized  by  actual  selection 
on  a simulated  binary  trait. 

Another  important  point  in  the  accurate  prediction  of  genetic  gain  is  accounting  for 
the  influence  of  linkage  disequilibrium  generated  by  selection.  Truncation  selection  on 
continuous  traits  induces  linkage  disequilibrium  which  in  turn  reduces  the  additive  variance 
and  hence  the  genetic  gain  (Bulmer  1971;  1976).  For  these  traits  the  effect  of  linkage 
disequilibrium  is  well  documented  in  the  literature  (Bulmer  1971;  1976;  Falconer  1989; 
Wray  and  Hill  1989;  Verrier  et  al.  1989;  Gomez-Raya  and  Burnside  1990;  Dekkers  1992; 
Villanueva  et  al.  1993).  Selection  on  binary  threshold  traits  is  comparable  to  truncation 
selection  on  a continuous  trait.  Therefore,  it  affects  the  additive  variance  of  these  traits 
(underlying  trait)  in  a similar  way  as  for  continuous  traits.  However,  because  of  the 
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peculiarities  of  binary  threshold  traits,  the  effect  of  linkage  disequilibrium  on  gain  for  these 
traits  can  be  quite  different  from  that  on  continuous  traits.  Despite  the  importance  of  this 
subject,  few  studies  have  occurred  in  this  area  (Roff  1994a).  In  chapter  3,  the  effect  of 
linkage  disequilibrium  caused  by  selection  on  binary  traits  is  considered  and  formulas  of  gain 
prediction  in  the  presence  of  linkage  disequilibrium  derived. 

Another  difficulty  of  prediction  of  genetic  gain  for  binary  traits  arises  in  breeding 
programs  when  data  from  different  test  locations  must  be  combined.  In  particular,  for  binary 
traits  it  is  common  to  combine  information  from  tests  with  different  mean  incidence  levels 
of  the  trait.  Also,  usually  the  breeder  has  a target  environment  (specific  incidence)  in  mind 
for  ranking  the  candidates  to  facilitate  selection  on  the  same  basis.  For  example,  in  forestry 
it  is  common  to  combine  fusiform  rust  data  (a  fungal  disease  of  pines)  from  different  test 
locations  with  different  incidences  to  predict  breeding  values  for  a single  target  environment 
having  50%  rust  incidence. 

For  continuous  traits  the  use  of  linear  covariances  to  translate  information  from 
different  sources  is  justified  based  on  normality  and  linearity  (Meijering  and  Gianola  1 985). 
However,  binary  traits  are  not  normally  distributed,  their  variance  is  dependent  on  their 
incidence  in  the  population,  and  breeding  values  obtained  from  locations  with  different 
incidences  are  not  linearly  related.  In  order  to  overcome  these  problems,  nonlinear  methods 
have  been  suggested  (Gilmour  etal.  1985;  Wolfinger  and  O'Connell  1 993).  However,  linear 
methods  are  still  the  most  used  in  animal  (Meijering  and  Gianola  1985)  and  tree  breeding 
(White  and  Hodge  1 989).  Therefore,  it  is  of  interest  to  compare  these  methods  as  a prediction 
tool,  when  the  levels  of  the  fixed  effects  present  different  incidences  of  the  trait. 
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Unfortunately  few  studies  have  been  done  in  this  area  (Meijering  and  Gianola  1985).  Chapter 
4 was  motivated  by  this  lack  of  information.  In  that  chapter,  genetic  merit  of  candidates 
predicted  by  linear,  generalized  linear,  univariate,  and  bivariate  methods  are  compared  under 
different  combinations  of  incidences  in  tests. 

Finally,  the  study  of  threshold  traits  requires  software  that  adequately  models  those 
traits  and,  for  some  specific  studies,  models  selection  on  those  traits.  Chapter  5 presents  two 
programs  developed  in  SAS®  (S AS  Institute,  1 990)  that  allow  the  generation  of  binary 
threshold  data  and  the  simulation  of  a complete  cycle  of  breeding  for  a binary  threshold  trait, 
as  might  be  common  in  a breeding  program. 


CHAPTER  2 

GENETIC  GAIN  FROM  MASS  SELECTION  ON  BINARY  THRESHOLD  TRAITS 

Introduction 

Although  most  traits  in  breeding  are  measured  on  a continuous  scale,  some  important 
traits  are  restricted  to  two  classes,  the  so-called  binary  traits  (e.g.,  dead/alive, 
infected/uninfected).  Binary  phenotypes  may  have  a discontinuous  genetic  backgroimd 
involving  one  or  few  major  genes,  or  a continuous  genetic  background  involving  many  minor 
genes.  The  analysis  of  binary  traits  of  the  first  type  is  primarily  based  on  Mendelian  genetics. 
The  analysis  of  the  second  type  of  trait,  called  all-or-none  or  binary  threshold  traits,  is  much 
more  complex  since  quantitative  genetics  techniques  are  required  and  it  is  known  that 
categorical  data  violate  the  major  assumptions  involved  in  many  such  analyses.  Most 
techniques  of  analysis  for  this  second  type  of  trait  are  based  on  the  threshold,  multifactorial 
model  presented  by  Wright  (1934).  According  to  this  model,  the  measured  binary  trait  is 
underlain  by  a continuous  trait  (sometimes  called  liability)  controlled  by  many  minor, 
additive  genes  and  environmental  factors.  The  discreteness  in  the  binary  trait  is  given  by  a 
threshold,  such  that  if  the  underlying  continuous  trait  is  larger  than  some  threshold  value  one 
phenotype  is  observed  (e.g.,  infected),  otherwise  the  alternative  phenotype  is  observed  (e.g., 
vminfected). 
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Dempster  and  Lemer  (1950)  showed  that  heritability  estimates  based  on  variance 
components  obtained  from  binary  data  was  dependent  on  the  incidence  of  the  trait  in  the 
population.  Based  on  Wright's  (1934)  threshold  model,  these  authors  suggested  that  binary 
heritability  should  be  converted  to  an  imderlying  continuous  scale  to  be  independent  of  the 
frequency  of  the  trait.  This  technique  has  been  the  most  studied  and  used  in  breeding. 
Magnussen  and  Kremer  (1995)  suggested  a method  of  estimation  assuming  that  the  random 
effects  (the  genetic  entities)  are  distributed  as  beta-binomial.  New  techniques  of  estimation 
have  also  been  suggested  in  the  context  of  generalized  linear  mixed  models  (Schall  1991; 
Wolfinger  and  O'Connell  1993).  The  method  by  Wolfinger  and  O'Connell  (1993)  uses  an 
iterative  process  to  fit  a weighted  linear  mixed  model  to  a transformed  variable  by  a link 
function,  using  a pseudo-likelihood  estimation  procedure.  This  method  is  implemented  in 
a program  (macro)  called  GLIMMIX  written  in  S AS®,  using  the  MIXED  procedure  (Littell 
et  al.  1996).  The  method  by  Schall  (1991)  is  based  on  an  approach  suggested  by  Stiratelli  et 
al.  (1984)  for  solution  of  fixed  and  random  effects  and  dispersion  (variance)  parameters  in 
a generalized  linear  mixed  model  context.  This  method  is  implemented  in  the  Fortran 
package  ASREML  (Gilmour  et  al.  1 997). 

Other  methods  for  estimation  of  heritability  of  binary  threshold  traits  are  data 
transformation  (Rockwood  and  Goddard  1973;  Sohn  and  Goddard  1979),  mixed  model 
analysis  (Harville  and  Mee  1 984;  Gilmour  et  al.  1 985),  Bayesian  methods  (Hoeschele  et  al. 
1987),  and  standard  linear  methods  applied  directly  on  0/1  data  as  if  they  were  continuous 
(Dieters  et  al.  1 996).  The  accuracy  of  some  of  these  methods  in  estimating  heritability  has 
been  evaluated  (van  Vleck  1972;  Mantysaari  et  al.  1991;  Magnussen  and  Kremer  1995). 
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However,  the  methods  have  not  been  adequately  evaluated  for  their  ability  to  predict  genetic 
gain  from  selection  based  on  binary  traits. 

Two  important  points  require  attention  when  comparing  methods  for  estimation  of 
heritability  or  gain  of  binary  threshold  traits.  First,  the  major  interest  of  the  breeder  is  in 
genetic  gain  on  the  observable  binary  scale,  because  this  is  the  scale  where  progress  is  made 
and  is  the  only  observable  scale.  Yet,  usually  no  connection  with  realized  gain  in  this  scale 
has  been  set  forth  by  investigators  when  comparing  methods,  and  usually  the  analyses  only 
compare  heritability  values  on  the  continuous  unobservable  scale.  Second,  the  heritability 
estimated  directly  from  0/1  data,  although  the  oldest  method  and  one  in  use  in  some  fields 
such  as  forestry  (Dieters  et  al.  1 996),  has  not  been  adequately  evaluated. 

The  major  objectives  of  this  chapter  were  to  compare  the  following  types  of  binary 
genetic  gains  using  a larger  number  of  samples  than  previous  works:  a)  gain  realized  from 
simulations  of  a complete  cycle  of  mass  selection  breeding  (taken  as  the  standard  against 
which  other  methods  are  compared);  b)  gain  predicted  using  heritability  estimated  directly 
from  0/1  data  (ANOVA  method);  c)  gain  predicted  using  heritability  on  the  underlying  scale 
converted  from  that  obtained  in  the  ANOVA  method  by  Dempster  and  Lemer's  (1950) 
formula  (DL  method);  d)  gain  predicted  using  heritability  on  the  underlying  scale  obtained 
using  generalized  mixed  model  as  described  by  Wolfinger  and  O'Connell  (1993)  and 
implemented  in  the  SAS®  macro  called  GLIMMIX  described  in  Littell  et  al.  (1996)  (WO 
method);  and  e)  gain  predicted  using  heritability  on  the  underlying  scale  obtained  using 
generalized  mixed  models  as  described  by  Schall  (1991)  and  implemented  in  the  ASREML 
package  by  Gilmour  et  al.  (1997)  (SCHALL  method). 
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Material  and  Methods 

For  each  of  several  different  situations,  simulation  produced  a complete  generation 
of  mass  selection  on  a binary  trait  as  might  be  common  in  a breeding  program.  The 
simulation  steps  for  a given  generation  were  as  follows:  generate  a base  population,  make 
selections,  allow  random  intermating  among  selections,  and  generate  a progeny  population. 
The  simulations  mimic,  for  example,  populations  of  plants  with  different  frequencies  of 
infected  individuals  with  a disease,  where  uninfected  individuals  are  selected,  intermated  and 
the  resulting  progeny  evaluated.  For  convenience,  this  binary  trait  will  be  referred  to  as 
infected/uninfected  by  a hypothetical  disease. 

The  objective  was  to  identify  methods  that  accurately  predict  the  reduction  in 
incidence  of  this  disease  in  the  progeny  population  due  to  mass  selection  of  uninfected 
parents.  For  each  level  of  the  true  heritability  on  the  underlying  scale  (h^  = 0.10, 0.30, 0.50, 
0.70,  and  0.90),  300  independent  base  populations  were  simulated,  each  with  80  half-sib 
families  and  24  individuals  per  family.  Each  population  was  exposed  to  five  levels  of 
incidence  of  the  disease  (5, 25,  50, 75,  and  95%),  resulting  in  7500  heritability-  incidence- 
population  combinations  (5  incidences*  5 heritabilities*  300  populations).  In  each  of  these 
7500  combinations,  80  individuals  were  mass  selected  based  on  the  binary  trait  (i.e., 
disease-free  individuals  selected)  and  randomly  intermated  among  themselves,  to  generate 
a population  of  offspring  containing  80  families  and  24  individuals  per  family.  At  the  same 
time,  in  each  of  these  7500  first-generation  base  populations,  genetic  parameters  were 
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estimated  by  several  analytical  methods  to  predict  the  gain  made  by  selection.  The  difference 
between  the  incidences  in  the  progeny  of  the  selected  population  and  the  parental  base 
population  was  used  as  a measure  of  the  realized  genetic  gain  (reduction  in  incidence  of  the 
binary  trait  from  one  generation  of  mass  selection).  This  realized  gain  was  the  standard  used 
to  compare  the  analytical  methods  of  gain  prediction. 

Simulations 

To  allow  computation  of  both  predicted  and  realized  gains  after  selection  on  a binary 
trait,  a complete  cycle  of  breeding  was  simulated:  parental  base  population,  mass  selection 
to  form  the  parental  selected  population,  and  random  mating  of  selected  parents  to  form  the 
progeny  population.  Three  hundred  progeny  tests  were  simulated  for  each  combination  of 
true  parameters.  Each  test  or  sample  had  80  half-sib  families  with  24  individuals  per  family, 
hypothetically  planted  in  a completely  randomized  design  in  a single  environment,  called  the 
base  population  or  BP  test.  In  total  there  were  300  BP  tests*  80  families*  24  individuals/ 
family  = 576,000  observations  per  heritability  and  incidence  level.  The  true  heritabilities  on 
the  underlying  scale  in  these  simulations  were  0.10,  0.30, 0.50, 0.70,  and  0.90.  Results  for 
0.7  are  not  shown,  because  the  general  pattern  was  the  same  as  that  shown  for  heritability 
0.9. 

Two  traits  were  simulated  following  the  threshold  model:  an  underlying  continuous 
trait  (y(o)ij)  and  the  resultant  binary  trait  (X(o)y).  The  model  used  to  simulate  the  continuous 
trait  was 


y(0)ij  ~ P(0)  f(0)i  W(0)ij  — P(0)  + f(0)i  n(0)ij  + C(o)ij 


(2-1) 
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where 

yjj(o)  is  the  underlying  (unobservable)  continuous  trait  in  the  base  population 
(generation  zero); 

P(o)  is  the  general  mean; 

f(o)i  is  the  random  effect  of  the  i***  female  general  combining  ability, 
i=l,...,80,fi  ~N(0,l/4a\); 

W(o)ij  is  the  random  effect  of  the  j*  individual  within  i*  family,)  =1,...,24, 

W(o)ij  ~ N(0,  3/40^^  + 

U(o)ij  is  the  random  effect  associated  to  the  j‘*'  male  general  combining  ability 
and  the  Mendelian  sampling  of  the  ij*  individual,  U(o)ij  ~ N(0,  3/4a^J;  and 

e(o)ij  is  the  error  associated  to  the  j*  individual  of  the  i*  family  in  the  base 
population,  e(o)ij  ~ N(0,  a\). 

In  the  simulations  of  the  base  population,  P(o>  was  assumed  zero.  The  effects  f(o)i,  U(o)ij 
and  e(o)ij  were  drawn  from  a normal  distribution  with  zero  mean  and  variances  l/4o^a , 3/4o^a, 
and  respectively.  Covariances  among  all  effects  in  model  2-1  were  assumed  to  be  zero. 
The  true  breeding  values  (U(o)ij)  of  the  1 920  individuals  in  a given  sample  (80  families  * 24 
individuals)  were  obtained  by  a(o)ij  = f(0)i  + U(o)ij  in  the  base  population.  These  breeding  values 
were  retained  and  used  in  the  generation  of  the  progenies  of  the  selected  individuals,  as 
described  in  equation  2-3 . 

The  binary  trait  (X(o)y)  was  generated  for  each  ij^''  individual  by  imposing  a threshold 


(T)  in  the  population  of  y(o)ij’s  as 
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0 if  y(0)ij  ^ T (desirable  phenotype) 


X(0)ij  - 


' if  y*  > T (undesirable  phenotype) 


(2-2) 


Different  threshold  values  were  imposed  in  the  distribution  of  the  continuous  trait  to 
simulate  five  incidences  of  the  binary  trait,  namely  5, 25,  50,  75,  and  95%  incidence  of  the 
imdesirable  phenotype  (i.e.,  diseased  individuals). 

From  each  simulated  base  population  (BP),  mass  selection  was  performed  on  the 
binary  trait,  selecting  80  uninfected  individuals  (individuals  with  X(o)ij  = 0)  to  advance  to  the 
next  generation.  These  80  individuals  were  intermated  by  randomly  crossing  each  female 
with  24  different  males  out  of  the  80  available,  generating  24  half-sib  offsprings  per  cross 
(one  progeny  for  each  male  parent).  No  action  was  taken  to  avoid  reciprocal  crosses.  These 
80  simulated  half-sib  families  were  planted  in  an  experiment,  called  Progeny  of  Selected 
population  or  PS  tests,  with  the  same  design  as  the  parental  base  population  described  above, 
that  is,  24  individuals  from  each  of  the  80  families  completely  randomized. 

The  phenotypic  values  of  the  progenies  of  selected  parents  were  obtained  according 
to  the  model 


y(l)ij  ~ P(l)  ('^  ^0)if  3(0)im)  V(i)jj  + e 


(l)ij 


(2-3) 


where 


yij(i)  is  the  underlying  continuous  trait  in  the  progeny  population  (generation  1); 
a(o)if  is  the  breeding  value  of  the  female  parent  of  the  i"*  individual  in  generation  zero; 
a^o)im  is  the  breeding  value  of  the  male  parent  of  the  i‘*’  individual  in  generation  zero; 
V(i)ij  is  the  random  effect  associated  to  the  Mendelian  sampling  of  the  ij*  individual, 
V(i)ij  - N(0,  l/2o\);  and 
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e(i)ij  is  the  error  associated  with  the  j*’’  individual  of  the  i*  family  in  the  progeny 
population,  e(|)jj  ~ N(0,  a\). 

In  model  2-3,  and  e(i)jj  are  drawings  from  a normal  distribution  with  zero  mean 
and  variances  l/2a^a  and  o^e»  respectively.  Further,  a(o)if  and  a(o)im  are  true  breeding  values,  on 
the  underlying  scale,  of  the  individuals  selected  from  the  parental  base  population.  These 
were  retained  for  all  parents  from  the  simulation  of  the  parental  base  population. 

The  same  threshold  values  used  in  the  BP  were  also  applied  in  the  progeny  of 
selected  individuals  (PS)  to  generate  the  binary  variable  X(i)jj  (equation  2-2). 

Realized  Gain  on  the  Binary  Scale 

Realized  gain  on  the  binary  scale  (Gr(x))  was  obtained  as  the  absolute  value 
(providing  positive  values)  of  the  difference  between  the  incidence  of  the  imdesirable 
phenotype  in  the  BP  and  PS  populations  described  above,  for  each  sample  as 

Gr(x)  = |P(i)  ■ P(o)l  (2-4) 

where 

Gr(x)  is  the  realized  gain  on  the  binary  scale; 

p(i)  is  the  mean  incidence  of  the  undesirable  phenotype  in  the  progeny  of  the  selected 
parents  for  the  binary  trait;  and 

P(o)  is  the  mean  incidence  of  the  undesirable  phenotype  in  the  base  population  for  the 
binary  trait. 

Three  hundred  realized  gains  were  obtained  for  each  level  of  heritability  and  each 
level  of  incidence  of  the  base  population.  Therefore,  7500  realized  gains  were  obtained  (5 
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heritabilities*5  incidences*300  populations).  Since  the  realized  binary  gain  is  the  one  that 
can  be  observed  after  selection,  and  quantified  by  the  breeder,  this  was  used  as  a reference 


Predicted  Gains  on  the  Binary  Scale 

The  first  gain  predicted  was  obtained  by  estimating  variance  components  by  the 
ANOVA  method  (Type  I method  of  SAS®)  on  the  BP  tests  using  0/1  data  directly  and 
obtaining  heritability  using  these  components  as 


where 

h^ANovA(x)  is  the  individual  heritability  on  the  binary  scale  (x); 
is  the  family  variance  component;  and 
is  the  error  variance  component. 

The  heritability  obtained  by  equation  2-5  was  then  used  to  predict  gain  on  the  binary 


of  comparison  for  the  subsequent  methods  of  predicting  genetic  gains. 


(2-5) 


scale  by 


(2-6) 


where 


is  the  predicted  gain  on  the  binary  scale  using  heritability  estimated 


directly  on  the  0/1  data; 


is  the  individual  heritability  on  the  binary  scale;  and 


S(x)  is  the  selection  differential  on  the  binary  scale  (S(x)  = 0 - P(o)). 


A second  gain  was  predicted  using  the  method  described  by  Dempster  and  Lemer 
(1950)  (referred  to  here  as  DL  method).  The  heritability  on  the  binary  scale  (h^ANovA(x)  > 
equation  2-5)  was  converted  to  the  underlying  scale,  h^oL(y)»  by  Dempster  and  Lemer's  (1950) 
method 

b^DL(y)  ~ [h^ANOVA(x)  *P(0)(  1 "P(0))]/C^  (2"7) 

and  gain  on  the  underlying  scale  (GoL(y))  was  obtained  by 

GoL(y)  ~[i*h^DL(y)*<^y(0)]/<^y(0)  ~ i*h^DL(y)  (2"8) 

where 

GoLCy)  is  the  predicted  gain  on  the  underlying  scale,  obtained  by  conversion 
of  heritability  from  binary  to  underlying  by  Dempster  and  Lemer’s 
method,  expressed  in  units  of  standard  deviations; 
h^DL(y)  is  the  heritability  on  the  underlying  scale; 
c is  the  ordinate  of  the  normal  density  function  at  the  threshold; 
i is  the  selection  intensity; 

Oy(0)  is  the  phenotypic  standard  deviation  of  the  underlying  trait;  and 
h^ANovA(x)  is  the  individual  heritability  on  the  binary  scale  (x)  as  estimated  by 
ANOVA  on  0/1  data. 

This  gain  on  the  underlying  scale,  GoL(y),  was  converted  to  a binary  scale  gain  by 

+ 00  +00 

Gdl(x)  = jf(y)dy-  jf(y)dy 


(2-9) 


where 
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Gdl(x)  is  the  predicted  gain  on  the  binary  scale  transformed  from  the  underlying 
continuous  gain  obtained  by  Dempster  and  Lemer's  method; 

f(y)  is  the  standard  normal  probability  density  function; 

Z(i)  is  the  normal  deviate  on  the  underlying  scale  corresponding  to  the  mean 
incidence  in  the  progeny  population;  and 

Z(o)  is  the  normal  deviate  on  the  underlying  scale  corresponding  to  the  mean 
incidence  in  the  base  population. 

The  area  under  the  normal  curve  defined  by  the  integrals  in  equation  2-9  was 
computed  using  the  PROBNORM  function  from  SAS®  (SAS  Institute,  1 990). 

A third  gain  on  the  binary  scale  (Gpj^x))  (called  PT  method)  was  obtained  using  the 
same  steps  described  for  the  second  type  above  (DL  method);  however,  the  underlying  gain 
in  equation  2-8  was  computed  using  the  true  underlying  heritability  instead  of  the  underlying 
heritability  converted  from  the  binary  scale.  This  mimics  a method  that  when  applied  to 
binary  threshold  traits  would  give  the  true  imderlying  heritability.  Some  methods  approach 
to  this  "perfect"  method  for  some,  but  not  for  all,  situations  (van  Vleck  1972;  Mantysaari  et 
al.  1991;  Magnussen  and  Kremer  1995).  The  question  is,  if  the  true  heritability  on  the 
underlying  scale  is  known,  can  it  be  accurately  converted  to  realized  gain  on  an  observable 
binary  scale? 

A fourth  gain  was  computed  using  the  method  proposed  by  Schall  (1991)  and 
implemented  in  the  program  ASREML  by  Gilmour  et  al.  (1997)  (here  referred  to  as 
SCHALL  method).  This  is  based  on  a method  suggested  by  Stiratelli  et  al.  (1984)  for 
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solution  of  fixed  and  random  effects  and  dispersion  (variance)  parameters  in  a generalized 
linear  mixed  model  context.  Contrary  to  the  fifth  method  described  below,  the  SCHALL 
method  does  not  estimate  the  scale  or  dispersion  parameter  (here  the  error  variance 
component).  It  automatically  sets  it  to  one  (o^v^^y)scHALL  ~ !)•  The  estimates  of  variance 
components  on  the  underlying  scale  obtained  from  this  method  were  used  to  estimate  the 
heritability 

SCHALL  (y)  “ 4*a^{|fy)sCHALL/(‘^^fl;y)SCHALL  <^^w(y)SCHALL) 

where  o^,(y)scHALL  <^^w(y)scHALL  ^6  t^e  estimate  of  variance  components  obtained  by  the 
SCHALL  method.  The  genetic  gain  on  the  underlying  scale  was  predicted  as  GscHALL(y)  “ 
i*h^scHALL(y)-  This  gain  (GscHALL(y))  was  then  converted  to  a binary  scale  as  described  in 
equation  2-9  using  GscHALL(y)  place  of  GoL(y).  The  resulting  gain  on  the  binary  scale  is 

GscHALL(x)* 

A fifth  gain  was  obtained  by  the  method  developed  by  Wolfmger  and  O’Connell 
(1993)  and  implemented  in  the  GLIMMIX  macro  written  in  SAS®  (Littell  et  al.  1 996)  (WO 
method).  Unlike  the  SCHALL  method,  in  the  WO  method  both  components  o^f(y)wo  and 
a^w(y)wo  3re  estimated.  After  these  components  were  estimated  using  GLIMMIX,  the  gain  on 
the  underlying  scale  (Gwo(y))  was  computed  as  described  for  SCHALL  method  and  then 
converted  to  genetic  gain  on  the  binary  scale  (now  called  Gwo(x))- 

Heritability  on  the  Binary  Scale 

Three  heritabilities  on  the  binary  scale  were  studied  to  examine  the  differences  in 
gains;  a)  the  heritability  using  variance  components  estimated  using  0/1  data  (h^ANovA(x))»  b) 
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the  heritability  on  the  binary  scale  converted  from  the  true  heritability  by  Dempster  and 
Lemer’s  method  (h^TDL(x))»  c)  the  realized  heritability  (h\(x))-  The  heritability  h^TOL(x)  was 
computed  from  equation  2-7  substituting  for  h^xDL(y)  with  the  true  heritability  and  solving  for 
h^NovA(x)  (now  h^DL(x)  )•  The  realized  heritability  was  computed  by 

h^R(x)  ~ Gr(x/P  (2-10) 

Heritability  and  Gain  on  the  Underlying  Scale 

Although  the  major  focus  of  this  chapter  was  the  estimation  of  heritability  and 
prediction  of  gain  on  the  binary  observable  scale,  heritabilities  and  gains  on  the  underlying 
scale  were  obtained  to  check  the  simulation  program  and  to  evaluate  the  methods  on  the 
same  basis  as  other  authors  (van  Vleck  1972;  Mantysaari  et  al.  1991),  but  with  a larger 
number  of  samples.  Three  types  of  gains  on  the  underlying  scale  were  obtained:  predicted 
gain  using  the  true  heritability  (GpT(y)),  predicted  gain  using  the  heritability  based  on  variance 
components  obtained  from  data  analysis  on  the  underlying  scale  (Ganova(v))  ^nd  realized  gain 
obtained  as  the  difference  of  the  means  in  the  underlying  of  BP  and  PS  populations  (GR(y)). 
These  gains  were  obtained  by 


PT(y)  ~ i*h^*<^y(0) 

(2-11) 

ANOVA(y)  “ i*h  ANOVA(y)*<^y(0)ANOVA 

(2-12) 

R(y)  ~ 1 P(i)  ■ P(0)l 

(2-13) 

where 

Gpj(y)  is  the  predicted  gain  on  the  underlying  scale  using  the  true  heritability; 
GANovA(y)  is  thc  predicted  gain  on  the  underlying  scale  using  the  heritability 
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estimated  from  variance  components  obtained  from  analysis  of  the 
continuous  trait; 

GR(y)  is  the  realized  gain  on  the  underlying  scale; 
i is  the  selection  intensity; 

h^  is  the  true  heritability  on  the  underlying  continuous  scale; 

^y(o)  (<^y(o)ANovA)  is  the  square  root  of  the  phenotypic  variance  obtained  by  summing 
the  true  (estimated)  variance  components,  that  is,  a^y(o)  = a^f^y)  + o^w(y)»  where 
a^f(y)  and  a^^y)  are  the  true  (estimated)  variance  components  between  and 
within  half-sib  families,  respectively; 

Oy(o)ANovA  is  the  square  root  of  the  phenotypic  variance  obtained  by  summing 
the  variance  components  estimated  by  ANOVA; 

P(i)  is  the  mean  of  the  progeny  of  the  selected  parents  for  the  underlying  trait; 

P(o)  is  the  mean  of  the  parental  base  population  for  the  underlying  continuous  trait; 
h^ANovA(y)  is  the  individual  heritability  for  the  underlying  continuous  trait  estimated 
from  variance  components;  and 
h^  is  the  true  heritability  on  the  underlying  continuous  scale. 

Results  and  Discussion 

Heritability  and  Gain  on  the  Underlying  Scale 

Genetic  gains  and  heritabilities  on  the  underlying  scale  are  presented  in  Figures  2-1 
and  2-2,  respectively.  The  predicted  gains  using  the  true  heritability  (Gpx(y),equation  2-11) 
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Figure  2-1.  Realized  gain  on  the  underlying  scale  [GR(y)]  and  predicted  gain  from 
heritability  estimated  using  the  imderlying  continuous  data  [Ganova(v)]  or  the  true 
underlying  heritability  [Gpx(y)],  expressed  in  units  of  the  underlying  trait,  for  several 
incidences  of  the  undesirable  phenotype  and  true  heritability  levels  on  the  imderlying 
scale  (h^).  Each  point  is  an  average  of 300  samples  of  simulated  experiments  for  GR(y)  and 
GANovA(y)>  while  Gpx(y)  was  calculated  once  for  each  true  heritability  and  incidence. 


Heritability  on  Underlying  Scale  Heritability  on  Underlying  Scale 
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Figure  2-2.  Heritabilities  on  the  underlying  scale  obtained  from  binary  data  by  WO 
[h^wo(y)]>  SCHALL  [h^scHALL(y)]5  ^nd  DL  methods  [h^oL(y)]»  for  several  incidences  of 
the  imdesirable  phenotype  and  true  heritability  levels  on  the  underlying  scale  (h^). 
Each  point  is  an  average  of  300  samples  of  simulated  experiments  and  the  vertical 
bars  are  95%  confidence  intervals. 
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and  using  the  estimate  of  heritability  based  on  variance  components  of  the  underlying 
continuous  trait  (GANovA(y)»  equation  2- 1 2)  are  very  close  to  the  realized  gain  from  a complete 
cycle  of  selection  (GR(yj,  equation  2-13).  This  suggests  that  the  underlying  trait  used  to 
generate  the  binary  trait  was  adequately  simulated,  since  the  results  here  agree  with  the 
theory  that  ANOVA  methods  are  appropriate  when  the  assumptions  are  satisfied. 

The  method  suggested  by  Dempster  and  Lemer  (1950)  has  been  the  most  frequently 
used  in  animal  breeding  for  estimation  of  heritability  of  threshold  traits  and  by  far  the  most 
studied  (Dempster  and  Lemer  1 950;  van  Vleck  1 972;  McGuirk  1 989;  Mantysaari  et  al.  1991; 
van  Vleck  and  Gregory  1992;  Magnussen  and  Kremer  1995).  Following  this  method,  after 
heritability  is  estimated  based  on  0/1  data  using  ANOVA  or  another  method,  it  is 
transformed  to  an  underlying  scale  as  in  equation  2-7  (called  theDL  method,  in  this  chapter). 
In  Figure  2-2,  it  is  shown  this  method  closely  estimates  the  tme  heritability  at  incidences 
between  25  and  75%.  However,  DL  method  overestimates  the  true  heritability  of  the 
underlying  trait  for  extreme  incidences  of  the  binary  trait.  The  overestimation  is  larger  for 
larger  values  of  heritability.  In  general  terms,  these  results  agree  with  most  studies  evaluating 
Dempster  and  Lemer’s  method  (Dempster  and  Lemer  1950;  van  Vleck  1972;  Olausson  and 
Ronningen  1975;  McGuirk  1989;  Mantysaari  et  al.  1991;  van  Vleck  and  Gregory  1 992).  This 
method  has  been  shown  to  be  dependent  on  the  incidence  of  the  observable  binary  trait,  the 
tme  heritability  of  the  underlying  trait,  and  to  some  extent  on  the  model  used  in  the 
estimation  (parent-offspring  analysis  vs  half-sib  analysis).  Overestimation  at  extreme 
incidences  and  higher  heritabilities  has  been  shown  by  van  Vleck  (1972)  and  by  McGuirk 
(1 989),  referring  to  unpublished  results  of  McGuirk  and  Thompson.  However,  van  Vleck  and 
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Gregory  (1992)  observed  the  overestimation  associated  with  DL  method  occurs  at  lower 
heritability  and  Mantysaari  et  a/.  ( 1 99 1 ) observed  no  overestimation  at  all.  Mantysaari  et  al. 
(1991)  pointed  out,  as  a possible  reason  for  the  absence  of  overestimation  in  their  study,  the 
fact  that  the  transformation  was  done  sample  by  sample  and  using  the  incidence  of  that 
sample,  not  the  presumed  level  of  incidence.  However,  we  used  the  same  approach  in  our 
study  and  still  we  observed  overestimation.  A possible  reason  for  lack  of  overestimation  in 
Mantysaari  and  coworkers’  study  is  the  fact  they  used  a low  heritability  (h^=0.20).  At  low 
heritabilities,  we  observed  the  DL  method  overestimates  only  slightly  the  true  underlying 
heritability  (Figure  2-2). 

Other  methods  of  estimation  used  here  were  those  based  on  generalized  linear  mixed 
models  (Schall  1991;  Wolfinger  and  O'Connell  1993),  a powerful  technique  particularly 
appropriate  for  estimation  of  fixed  and  random  effects  for  nonnormal  data,  including  binary 
threshold  data.  The  algorithms  by  Schall  (1991)  and  by  Wolfinger  and  O’Connell  (1993), 
despite  their  incorporation  into  two  major  software  packages  for  data  analysis  (SAS®  and 
ASREML),  have  not  been  studied  in  the  context  of  estimation  of  heritability  or  gain 
prediction  for  binary  traits.  The  method  using  the  algorithm  from  Wolfinger  and  O’Connell 
(1993)  (WO  method,  h^wo(y))  tracks  the  true  heritability  only  for  intermediate  incidences 
(Figure  2-2).  At  extreme  incidences,  the  heritabilities  estimated  by  this  method  overestimate 
the  true  heritability,  as  occurred  with  DL  method.  However,  the  bias  associated  with  WO 
method  is  larger  than  that  associated  with  DL  method.  On  the  other  hand,  the  method  using 
the  algorithm  suggested  by  Schall  (1991)  (SCHALL  method,  h^scHALL(y))  underestimates  the 
true  heritability  (Figure  2-2).  The  underestimation  is  accentuated  at  extreme  incidences  and 
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at  higher  heritabilities.  The  bias  associated  with  the  SCHALL  method  is  close  to  that 
associated  to  DL  method,  but  in  the  opposite  direction. 

The  SCHALL  method  produced  estimates  of  variance  components  for  the  random 
effects  (here  family  variance  component)  smaller  than  those  produced  by  WO  method, 
particularly  at  extreme  incidences  (Figure  2-3).  These  results  confirm  those  obtained  by 
Wolfinger  and  O’Connell  (1993)  with  salamander  mating  data.  The  SCHALL  method 
constrains  the  error  variance  component  to  one  (when  this  is  smaller  than  one  if  not 
constrained),  resulting  in  smaller  estimates  for  the  other  components.  The  combination  of 
smaller  family  variance  component  estimates  (n^f(y)scHALL)  ^^id  larger  error  variance  estimates 
(cf^w(y)scHALL)  ill  th®  SCHALL  method  resulted  in  underprediction  of  heritability  (Figure  2-2). 

Heritabilitv  and  Gain  on  the  Binary  Scale 

Realized  gain  is  a standard  method  to  compare  prediction  techniques  for  continuous 
traits.  However,  this  approach  has  not  been  used  to  compare  methods  of  estimation  of 
heritability  for  binary  threshold  traits,  even  knowing  the  major  use  of  heritability  is  the 
prediction  of  gain.  Moreover,  the  interest  of  the  breeder  is  in  the  realized  genetic  gain  on  an 
observable  scale.  In  this  section,  heritabilities  and  gains  on  a binary  (incidence)  scale  are 
presented  and  compared  with  those  realized  by  a cycle  of  mass  selection  in  the  binary  trait. 

Heritabilities  on  the  binary  scale  are  presented  on  Figure  2-4,  each  point  being  an 
average  of  300  samples.  For  most  situations  the  heritability  using  variance  components 
estimated  from  0/1  data  directly  (h^ANovA(x)>  equation  2-5)  tracks  that  obtained  on  the  binary 
scale  converted  from  the  true  heritability  by  Dempster  and  Lemer’s  (1950)  method  (h^TDL(x)> 


25 


u 


Uh 


8 


> 

a 

• ^ 


1.02 
1.00 
0.98 
0.96  • 
0.94  - 
0.92  - 
0 90  - 


h^=0.1 

O- O 


r 


/ 


/ 


/ 


0.88 


— I— 
20 


— a 


w(y)WO 


w(y)SCHALL 


w(y)TRUE 


I 

40 


I 

60 


\ 


\ 


\ 


I 

80 


100 


Incidence  (%) 


Incidence  (%) 


Figure  2-3.  Family  and  within  family  variance  components  estimated  by  WO  method 
(<^^«(y)WO  and  a^w(y)wo)  and  SCHALL  method  (a^  f^y)scHALL  and  o^w(y)scHALL)  and  the  true 
variance  components  (o^  f(y)XRUE  and  o^„(y)XRUE),  for  several  levels  of  incidence  of  the 
undesirable  phenotype  and  true  heritability  levels  on  the  underlying  scale  (h^).  Each 
point,  except  the  true  parameters  (o^f(y)TRUE  and  o^w(y)TRUE)  is  an  average  of  300  samples 
of  simulated  experiments  and  the  vertical  bars  are  95%  confidence  intervals. 
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Figure  2-4.  Realized  heritability  on  the  binary  scale  [h^R(x)]  and  heritabilities  obtained 
from  variance  components  estimated  directly  on  0/1  data  [h^ANovA(x>]  obtained  by 
transforming  the  true  underlying  heritability  to  binary  scale  by  Dempster  and  Lemer's 
method  [h^xoLCxJ^  for  several  levels  of  incidence  of  the  undesirable  phenotype  and  true 
heritability  levels  on  the  underlying  scale  (h^).  Each  point  is  an  average  of  300  samples 
of  simulated  experimerits  and  the  vertical  bars  are  95%  confidence  intervals. 
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Figure  2-4).  These  two  heritabilities  (h^ANovA(x)5  h^xDL(x))  provide  good  estimates  of  the 
realized  heritability  (h^R(x>,  equation  2-10)  at  low  true  values.  However,  they  underestimate 
the  realized  heritability  at  high  true  heritabilities,  especially  when  the  incidence  of  the 
undesirable  phenotype  is  high.  Further,  the  heritabilities  obtained  by  Dempster  and  Lemer’s 
(1950)  method  and  those  obtained  from  variance  components  estimated  using  0/1  data  are 
symmetric  aroimd  50%  incidence,  but  the  realized  heritability  is  not  symmetric  (Figure  2-4). 
Therefore,  when  comparing  methods  of  estimation  of  binary  threshold  traits,  it  is  not  a good 
strategy  to  examine  only  incidences  smaller  than  50%  or  only  incidences  larger  than  50%. 
The  results  obtained  from  one  side  of  50%,  may  not  be  valid  for  the  other  side  as  is  assumed 
for  symmetry. 

Realized  and  predicted  gains  on  the  binary  scale  are  presented  in  F igure  2-5 . For  traits 
of  low  heritability  (h^  < 0.3)  all  methods,  including  the  ANOVA  method  based  on  0/1  data, 
result  in  predicted  gains  close  to  that  realized  by  selection  across  all  incidences  (5  to  95%). 
However,  the  WO  method  from  Wolfinger  and  O’Connell  (1993)  overestimated  the  gain  at 
incidences  of  the  undesirable  phenotype  larger  than  75%,  even  for  low  true  heritabilities 
(h^  = 0.1).  For  traits  of  moderate  to  high  heritabilities  (h^  > 0.5)  and  at  high  incidences  (p  > 
60%),  both  DL  and  WO  methods  overestimate,  while  ANOVA  and  SCHALL  underestimate 
the  realized  gain.  SCHALL  method  is  the  least  biased  among  those  tested. 

For  true  heritabilities  smaller  than  0.5,  the  maximum  realized  gain  and  gains 
predicted  by  ANOVA  and  SCHALL  methods  occur  near  75%  incidence  of  the  undesirable 
phenotype.  Sohn  and  Goddard  (1979),  using  real  data  of  fusiform  rust  incidence  in  pines  (a 
fungal  disease  of  pines),  observed  that  the  gain  predicted  using  ANOVA  method  on  0/1  data 


28 


Figure  2-5.  Realized  gain  on  the  binary  scale  [Gr(^)]  and  gains  predicted  by  WO 
[Gwo(x)],  DL  [Gdl(xJ,  SCHALL  [Gschall(x)],  ANOVA  [Ganova(x)],  and  PT  methods 
[GpT(x)i,  expressed  in  percent  of  incidence,  for  several  levels  of  incidence  of  the 
imdesirable  phenotype  and  true  heritability  levels  on  the  underlying  scale  (h^).  Note 
that  the  graphs  for  Gr(^)  and  Gpx(x)  are  superimposed  and  that  each  point  is  an  average 
of  300  simulated  experiments  and  the  vertical  bars  are  95%  confidence  intervals. 
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tracks  the  realized  gain,  with  a slight  underestimation.  Those  authors  suggested  that 
maximum  gain  from  mass  selection  would  occur  at  incidences  of  70%.  The  predicted  gain 
peaks  near  75%,  because  it  is  a combination  of  two  factors  (see  equation  2-6):  the  heritability 
on  the  binary  scale  (which  peaks  at  50%,  Figure  2-4)  and  selection  differential  (which  is  the 
incidence  in  the  base  population  and  peaks  at  100%).  These  two  factors  are  jointly 
maximized  in  a region  between  50  and  1 00%. 

Although  many  complex  methods  have  been  suggested  to  analyze  binary  threshold 
traits  (Dempster  and  Lemer  1950;  Schall  1991;  Wolfinger  and  O'Connell  1993),  it  is  clear 
from  the  results  presented  here  that  heritability  estimated  from  0/1  data  directly  results  in 
predicted  gain  close  to  the  realized  gain  for  traits  of  low  heritability  (h^  < 0.3)  and  for  traits 
of  high  heritability  with  incidences  smaller  than  75%.  Many  estimates  of  heritability 
obtained  for  binary  threshold  traits  in  animals  and  plants,  when  transformed  to  the  underlying 
scale  by  Dempster  and  Lemer’s  (1950)  method,  will  be  smaller  than  0.3  (e.g..  Lush  et  al. 
1948;  Robertson  and  Lemer  1949;  Rockwood  and  Goddard  1973;  Tomar  and  Kumar  1991; 
Dieters  et  al.  1996).  Therefore,  at  least  for  these  published  results,  the  estimation  of 
heritability  directly  from  0/1  data  is  appropriate  if  the  goal  is  to  use  this  heritability  to  predict 
gain  from  mass  selection  on  the  observable  binary  scale  in  a single  environment. 

Elston  (1977)  has  defended  the  use  of  the  heritability  directly  on  the  binary  scale. 
According  to  him,  although  heritability  estimated  on  the  underlying  is  independent  of  the 
incidence  of  the  binary  trait,  it  is  dependent  on  the  distribution  of  the  unobservable, 
underlying  continuous  trait,  which  is  dependent  on  the  mode  of  inheritance  of  the  trait 
(monogenic  or  polygenic).  However,  the  use  of  heritability  estimated  directly  on  0/ 1 data  has 
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also  received  some  criticisms,  the  major  one  being  the  dependence  of  the  estimate  on  the 
incidence,  not  allowing  comparison  of  estimates  obtained  from  populations  with  different 
incidences  (Dempster  and  Lemer  1950;  Gianola  1982).  Another  criticism  is  the  dependence 
among  effects  in  the  linear  model  (Gianola  1 982).  However,  as  pointed  out  by  Guiard  et  al. 
(1985),  to  estimate  variance  components  there  is  no  requirement  about  independence  of  the 
effects  in  the  model;  what  is  in  fact  required  is  the  absence  of  correlation  among  those  effects 
(in  this  chapter  f(0)i  and  W(o)ij,  model  2-1).  These  authors  showed  that  this  correlation  is  zero 
for  binary  traits,  providing  support  for  estimation  of  variance  components  directly  from  0/1 
data. 

Currently,  the  approach  used  to  compare  estimates  of  heritability  from  different 
estimation  methods  is  to  compare  them  with  the  true  heritability  of  the  imderlying  trait. 
However,  another  attractive  strategy  is  the  use  of  the  realized  gain  on  the  binary  observable 
scale  as  a standard  of  comparison.  In  this  study  the  true  heritability  on  the  imderlying 
continuous  scale  was  used,  first,  to  predict  gain  on  the  underlying  scale  and  then  this  gain 
was  transformed  to  the  binary  scale  (Gpx(x>,  equation  2-8  substituting  for  h^oL(Y)  with  the  true 
heritability).  This  mimics  the  situation  in  which  a perfect  method  of  estimation  of  heritability 
on  the  underlying  scale  for  threshold  traits  can  be  developed.  The  gain  obtained  in  this  way 
matches  perfectly  the  gain  realized  from  selection  (Gr(x),  Figure  2-5).  So,  if  the  true 
underlying  heritability  is  known,  then  this  can  be  converted  to  gain  on  the  binary  scale  and 
matches  perfectly  the  realized  gain  from  mass  selection.  This  supports  the  strategy  of  most 
authors  (e.g.,  van  Vleck  1972;  Olausson  and  Ronningen  1975;  McGuirk  1989)  in  comparing 


31 


their  methods  of  estimation  of  heritability  for  binary  threshold  traits  with  the  true  underlying 
heritability. 

Conclusions 

Although  many  complex  methods  have  been  suggested  to  analyze  binary  threshold 
traits  (Dempster  and  Lemer  1950;  Schall  1991;  Wolfinger  and  O'Connell  1993),  it  is  clear 
from  the  results  presented  here  that  heritability  obtained  from  0/1  data  directly  (without 
transformation  of  data  or  parameter  estimates)  results  in  predicted  gain  close  to  the  realized 
gain  for  traits  of  low  heritability  (h^  < 0.3)  and  for  traits  of  high  heritability  with  incidences 
smaller  than  75%.  Many  traits  in  animal  and  plant  breeding  fall  into  this  category.  One  of 
the  limitations  of  this  study  is  that  it  is  restricted  to  mass  selection  in  a single  test 
environment  (single  mean  incidence).  The  results  found  here  may  not  apply  to  situations  in 
which  test  environments  with  different  incidences  are  combined  together  for  prediction  of 
gain  or  for  more  complex  types  of  selection  indices. 

The  method  by  Dempster  and  Lemer  (1 950),  which  is  the  most  extensively  evaluated 
and  used  methodology  for  estimating  heritability  of  threshold  traits,  and  the  methods  from 
Wolfinger  and  O’Connell  (1993)  and  Schall  (1991),  based  on  generalized  linear  mixed 
models,  do  not  result  in  better  prediction  of  the  gain  on  the  observable  binary  scale  than 
methods  using  the  heritability  estimated  directly  from  binary  data. 

The  results  of  this  chapter  support  the  strategy  of  most  authors  in  comparing  their 
methods  of  estimation  of  heritability  of  binary  threshold  traits  with  the  frue  or  estimated 
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heritability  on  the  underlying  scale  (van  Vleck  1972;  Olausson  and  Ronningen  1975; 
McGuirk  1989;  Mantysaari  et  al.  1991).  In  this  chapter  it  was  shown  that  methods  of  analysis 
of  binary  data  that  produce  estimated  heritabilities  on  the  underlying  scale  close  to  the  true 
heritability  also  result  in  accurate  predictions  of  gain  on  the  binary  observable  scale. 


CHAPTER  3 

LINKAGE  DISEQUILIBRIUM  AND  COMPONENTS  OF  GENETIC  GAIN  FROM 

SELECTION  ON  BINARY  THRESHOLD  TRAITS 

Introduction 


Prediction  of  genetic  gain  is  one  of  the  most  importaint  contributions  of  quantitative 
genetics  for  animal  and  plant  breeding  (Hallauer  and  Miranda  1988).  Usually  prediction  of 
gain  is  based  on  estimates  of  heritability  obtained  from  the  first  generation  base  population. 
Predictions  of  gain  for  future  generations  are  valid  only  if  the  heritability  remains  constant 
or  changes  in  a predictable  manner.  Selection  can  affect  the  heritability  by  reducing  the 
additive  variance  through  change  in  the  gene  frequencies  in  the  population  or  through  linkage 
disequilibrium  (gametic  phase  disequilibrium).  The  effect  of  change  on  gene  frequencies  is 
more  important  for  long-term  prediction  of  gain  and  will  not  be  addressed  in  this  chapter. 
The  effect  of  linkage  disequilibrium,  on  the  other  hand,  is  more  important  for  the  short-term 
prediction  of  gain. 

For  continuous  traits,  the  effect  of  linkage  disequilibrium  caused  by  truncation 
selection  on  the  additive  variance  is  well  documented  in  the  literature  (Bulmer  1971 ; 1976; 
Falconer  1989;  Wray  and  Hill  1989;  Verrier  et  al.  1989;  Gomez-Raya  and  Burnside  1990; 
Dekkers  1992;  Villanueva  et  al.  1993).  The  first  author  to  consider  this  effect  was  Pearson, 
in  1903.  Later,  Lush  (1945,  pp.  140-144)  described  this  effect  with  some  detail,  but  without 
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quantifying  its  importance.  Finally,  Bulmer  (1971;  1976)  clearly  established  the  theory  and 
quantified  the  effect  of  linkage  disequilibrium  on  the  additive  variance,  under  truncation 
selection. 

However,  many  important  traits  in  plant  and  animal  breeding  are  not  continuous, 
instead,  they  are  measured  as  binary  traits  where  individuals  are  scored  in  one  out  of  two 
categories  (e.g.,  dead/alive,  infected/uninfected).  Few  studies  have  evaluated  the  effect  of 
linkage  disequilibrium  caused  by  selection  on  these  traits  (Roff  1994a),  although  some 
authors  have  noted  the  need  for  incorporating  it  into  methods  of  gain  prediction  (Reich  et  al. 
1972). 

In  quantitative  genetics,  binary  traits  are  usually  treated  as  binary  threshold  traits  in 
which  the  binary  phenotype  is  underlain  by  an  unobservable  continuous  trait  (sometimes 
called  liability).  When  the  underlying  continuous  trait  is  larger  than  some  threshold  value, 
one  phenotype  is  observed  (e.g.,  infected),  otherwise  the  alternative  phenotype  is  observed 
(e.g.,  uninfected).  Selection  on  binary  threshold  traits  is  comparable  to  truncation  selection 
on  a continuous  trait.  Therefore,  it  affects  the  additive  variance  of  the  underlying  trait  by 
linkage  disequilibrium  the  same  way  as  predicted  by  Bulmer  (1971;  1976).  However,  the 
effect  of  linkage  disequilibrium  on  gain  of  those  traits  can  be  quite  different  from  that  on 
continuous  traits,  because  of  some  peculiarities  of  binary  traits.  First,  when  selecting  on  a 
binary  trait,  the  percentage  of  individuals  selected  can  never  be  greater  than  the  relative 
incidence  of  the  desirable  phenotype  in  the  population.  Therefore,  for  a constant  threshold, 
as  the  percentage  of  individuals  with  the  desirable  phenotype  increases  across  generations 
of  selection,  selection  pressure  necessarily  decreases.  This  results  in  a pattern  of  change  for 
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the  additive  variance  across  generations  different  from  that  observed  for  continuous  traits 
(Roff  1994a). 

Second,  since  gain  in  the  binary  threshold  trait  is  obtained  by  the  difference  in  areas 
(probabilities)  delimited  by  the  threshold  and  the  distributions  in  the  base  and  progeny 
populations,  changes  in  either  the  mean  or  the  spread  (variance)  of  the  progeny  distribution 
will  affect  the  gain  in  the  binary  trait,  as  shown  in  this  chapter.  The  effect  of  linkage 
disequilibrium  on  the  mean  of  the  underlying  trait  has  been  considered  by  Roff  (1994a). 
However,  the  effect  on  the  spread  of  the  distribution  has  not  been  considered  in  the  literature. 

The  current  study  extends  Roffs  (1994a)  work  by  also  incorporating  the  effect  of 
linkage  disequilibrium  on  the  spread  of  the  distribution  in  the  equations  of  prediction  of  gain. 
Algebraic  equations  of  gain  prediction  are  derived  which  account  for  the  effects  of  linkage 
disequilibrium  on  both  the  mean  and  the  spread  of  the  distribution  of  the  imderlying 
continuous  trait.  Simulations  show  the  effect  of  linkage  disequilibrium  on  both  the  short-  and 
long-term  prediction  of  gain  from  selection  on  binary  traits. 

Theoretical  Development 


Background  and  Development 

There  are  situations  in  which  the  genotype  frequency  at  two  or  more  loci  considered 
jointly  are  not  the  same  as  that  expected  from  the  gene  frequencies  in  individual  loci.  In  these 
situations  it  is  said  the  population  is  in  linkage  disequilibrium  or  gametic  phase 
disequilibrium,  although  this  disequilibrium  does  not  depend  on  the  presence  of  linkage 


36 


between  loci  (Falconer  1 989).  As  mentioned  by  Falconer  (1989),  linkage  disequilibrium  can 
be  caused  by  mixing  populations  of  different  gene  frequencies,  genetic  drift,  and  selection. 

Truncation  selection  tends  to  choose  individuals  with  genetic  values  more  similar 
among  them  than  among  individuals  randomly  chosen  in  the  population  (Falconer  1989). 
The  objective  is  to  choose  individuals  having  only  favorable  alleles.  However,  this  is  rare  in 
a finite  population  with  an  infinite  number  of  loci  controlling  the  trait  (Hospital  and  Chevalet 
1996).  The  outcome  is  that  in  a given  individual,  favorable  alleles  at  one  locus  must  be 
associated  with  unfavorable  alleles  at  other  loci  (Falconer  1989).  This  association  of 
favorable  and  vmfavorable  alleles  builds  up  a negative  covariance  between  gene  effects  in  the 
loci  affecting  the  trait.  Since  the  additive  variance  for  a given  trait  is  computed  considering 
all  loci  affecting  that  trait,  this  negative  covariance  reduces  the  additive  variance. 

Recurrent  cycles  of  truncation  selection  increases  this  state  of  disequilibrium  in  the 
population,  reducing  the  additive  variance.  Bulmer  (1 97 1 ; 1 976)  showed  that  for  truncation 
selection  on  a continuous  trait  controlled  by  an  infinite  number  of  unlinked  minor  genes,  the 
additive  variance  is  reduced  in  the  first  four  generations  of  selection  and  then  a limit  is 
reached  where  there  is  no  further  reduction  in  additive  variance.  If  selection  is  relaxed,  under 
random  mating  the  additive  variance  returns  to  its  original  level.  A similar  effect  is  expected 
on  the  vmderlying  continuous  trait  when  selection  is  applied  on  the  binary  threshold  trait, 
since  for  the  continuous  trait  selection  on  the  binary  trait  is  a kind  of  truncation  selection  in 
which  only  individuals  below  or  above  a threshold  value  are  selected. 

Under  a threshold  model,  genetic  gain  in  the  binary  trait  is  entirely  dependent  on  the 
changes  in  the  distribution  of  the  underlying  continuous  trait.  One  way  this  distribution  is 
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changed  is  by  selection.  Besides  the  change  in  the  mean,  selection  also  reduces  the  variance 
of  the  underlying  trait.  The  reduced  additive  variance  of  the  imderlying  trait  has  two  effects 
on  the  genetic  gain  of  the  binary  trait:  one  direct  and  another  indirect. 

The  indirect  effect  occurs  through  a reduced  gain  in  the  continuous  trait  as  a 
consequence  of  the  reduced  additive  variance  in  the  past  generation.  For  example,  in  Figure 
3-1  the  gain  in  the  underlying  trait  considering  the  effect  of  linkage  disequilibrium  on  the 
mean,  Gy(2)M,  is  smaller  than  that  without  disequilibrium,  Gy(2)N;  consequently,  the  gain  in  the 
binary  trait  in  the  first  case  is  smaller  (areas  B + C<B  + C + D + E,  in  Figure  3-1).  Note  that 
this  indirect  effect  does  not  occur  in  the  generation  the  variance  was  reduced  (first 
generation),  but  one  generation  later.  The  reason  is  that  gain  in  a given  generation  is  a 
function  of  the  additive  variance  of  the  previous  generation. 

The  direct  effect  of  linkage  disequilibrium  on  the  gain  of  the  binary  trait  occurs  as 
an  immediate  consequence  of  the  smaller  spread  of  the  distribution.  For  example,  compare 
the  distribution  labeled  as  Y(,)k  with  that  labeled  Y(,)n,  in  Figure  3-1.  In  Y(,)n  the  effect  of 
linkage  is  ignored,  and  then  the  gain  in  the  binary  trait  in  the  first  generation  is  given  by  the 
area  A.  When  the  direct  effect  of  linkage  is  considered,  the  gain  is  given  by  the  areas  A + B. 
Therefore,  the  reduction  in  variance  due  linkage  disequilibrium  causes  an  extra  gain  in  the 
binary  trait.  Note  also,  that  this  effect  occurs  in  the  generation  the  additive  variance  is 
reduced  (first  generation),  not  one  generation  later  as  for  the  indirect  effect.  In  this  chapter 
the  direct  and  indirect  effects  will  be  called  effects  on  the  spread  and  on  the  mean  of  the 
distribution,  respectively.  Since  both  effects  depend  on  the  (additive)  variance,  the  term 
spread  will  be  used  in  order  to  reduce  confusion. 


V(2)N  ^(^*{2)N>  *^y(0)) 

V(2)M  “ 

V(2)MK  ~ N(j^(2)PyiK'  *^2)) 
V(1)N  ~ ^^1)N'  ^0)) 

y(1)K  ~ 

VpN  ~ ^0^(O)N'  ®^0)) 


Threshold 
^K8)c  * ® (desirable) 


l^(2)N  l^(2)M  ” ^^(2)MK  ^*(1)N  ” ^^(1)K  ^^(0)N 

<T2y(0)  ^ «^1)  ^ 

^(g)c " ^ (undesirable  phenotype) 


^^(2)N  P(2)M 


l\o)N  T 


®y(i)N"  ®y<i)K 


Gy(2)M"  ®y(2)MK 


»y(2)N 


Figure  3-1.  Distributions  of  the  underlying  continuous  trait  for  the  base  population 
(generation  zero)  and  for  the  first  and  second  generations  of  selection,  showing  the 
effect  of  linkage  disequilibrium  on  the  gain  of  the  binary  threshold  trait.  The 
subscripts  refer  to;  0, 1,2  = generations  0, 1,  and  2;  k=  linkage  effect  on  the  spread 
of  the  distribution;  M = linkage  effect  on  the  mean;  N = no  effect  of  linkage.  The 
letters  A,...,  F in  the  dashed  region  represents  areas  or  incidences  as  mentioned  in  the 
text. 
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In  the  following  sections  algebraic  equations  of  gain  prediction  for  binary  traits  are 
derived.  For  all  derivations  in  this  chapter,  an  infinite  number  of  loci  and  of  individuals  are 
assumed,  and  then  reduction  in  variance  due  to  change  in  gene  frequencies  is  ignored.  In  the 
first  section  the  threshold  model  is  presented.  The  second  section  presents  a general  formula 
of  prediction  of  genetic  gain  for  binary  threshold  traits.  The  next  three  sections  present 
equations  of  gain  prediction  ignoring  linkage  disequilibrium,  assuming  linkage 
disequilibrium  affecting  only  the  mean  of  the  distribution,  and  affecting  both  the  mean  and 
the  spread  of  the  distribution,  respectively.  After  that,  the  effect  of  linkage  disequilibrium  on 
the  short-  and  long-term  gains  is  quantitatively  evaluated  for  several  levels  of  heritability  and 
incidences,  using  the  equations  developed  in  previous  sections. 

Threshold  Model 

The  threshold  model,  proposed  by  Wright  (1934),  has  been  the  most  widely  used  in 
quantitative  genetic  analysis  of  binary  traits.  According  to  this  model,  the  measured  binary 
trait  is  underlain  by  a continuous  trait  controlled  by  many  minor,  additive  genes  and 
environmental  factors.  The  discreteness  in  the  binary  trait  is  given  by  a threshold,  such  that 
if  the  underlying  continuous  trait  is  larger  than  some  threshold  value  one  phenotype  is 
observed  (e.g.,  infected),  otherwise  the  alternative  phenotype  is  observed  (e.g.,  uninfected) 
(Figure  3-1). 

Considering  a completely  randomized  design,  the  underlying  continuous  trait,  yj(g)<,, 
and  the  binary  threshold  trait,  Xj(g)<,,  can  be  specified  by  the  following  models.  For  the 


continuous  trait 
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y i(g)c  M'(g)c  ^(g)c  ®i(g)c 

where 

yj(g)(;  is  the  underlying,  unobservable  continuous  trait  measured  on  the  i*  individual, 
in  the  g“’  generation,  under  the  c*  linkage  disequilibrium  state, 
g = 0,  . . . , t generations; 

c is  the  state  of  linkage  disequilibrium  (c=  K means  the  spread  of  the  distribution  is 
affected  by  linkage  disequilibrium,  c=M  means  the  mean  of  the  distribution 
is  affected  by  linkage  disequilibrium,  and  c=N  means  no  effect  of  linkage 
disequilibrium); 

P(g)<,  is  the  general  mean  of  the  g“’  generation,  under  the  c*  state  of  linkage 
disequilibrium; 

aj(g)c  is  the  random  additive  effect  of  the  i***  individual,  ~ N(0,  o\(g));  and 

Cj(g)c  is  the  random  environmental  and  non-additive  genetic  effects  associated  with  the 
i*  individual,  ej(g)<,  ~ N(0,  o^e)- 

The  binary  trait,  Xi(g)<.,  can  be  specified  for  each  individual  by  imposing  a threshold, 

T,  in  the  population  of  yi(g>c’s  as 

Jo  if  y < T (desirable  phenotype) 

^i(g)c  \ if  > X (undesirable  phenotype) 

General  Formula  for  Prediction  of  Genetic  Gain  for  Binary  Threshold  Traits 


Genetic  gain  in  binary  threshold  traits  is  given  by  the  difference  in  incidence  of  the 
trait  in  the  populations  before  (base  population)  and  after  selection  (progeny  population). 
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Therefore,  the  general  formula  of  genetic  gain  prediction  for  these  traits  is 

T T 

^ X (g)  c I ^(y (g)  c )^y (g)  c ~ I ^(y (g-o  c )^y (g-i>  c 

- 00  - 00 

^(g)c  ^(g-l)c 

= |f(z)dz-  |f(z)dz  (3-1) 

- 00  - 00 

where 

Gx  (g)  c is  the  genetic  gain  in  the  binary  trait  for  the  g“’  generation,  assuming  the 
c*  state  of  linkage  disequilibrium; 

f(y(g)  c)  is  the  normal  probability  density  function  for  the  population  in  the  g* 
generation,  assuming  the  c*  state  of  linkage  disequilibrium  and  with 

y(g)c  ~ ^(M'(g)c»  ^ y(g))’ 

P(g)e  is  the  mean  of  the  g*  generation  population,  assuming  the  c*  state  of  linkage 
disequilibrium; 

o^y(g)  is  the  phenotypic  variance  of  the  underlying  continuous  trait  in  the  g* 
generation; 

f(z)  is  the  standard  normal  probability  density  function,  with  z ~ N(0,1); 

T is  the  threshold  value;  and 

Z(g)  e is  the  standardized  normal  deviate  for  the  threshold  in  the  g*  generation, 
assuming  the  c*  state  of  linkage  disequilibrium. 

Gain  in  the  Binary  Trait  without  Linkage  Disequilibrium  [G^  (g)N] 


Selection  on  the  observable  binary  trait  accumulates  favorable  alleles  in  the 
underlying  unobservable  continuous  trait,  moving  the  mean  of  the  distribution  from  P(o)N  to 
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in  Figure  3-1 . With  a fixed  threshold,  this  change  in  the  mean  of  the  underlying  trait 
increases  the  frequency  of  individuals  with  the  desirable  phenotype,  that  is,  those  with 
Xi(g)c=0,  resulting  in  genetic  gain  in  the  binary  observable  trait.  In  the  simplest  case,  ignoring 
changes  in  additive  variance  due  to  linkage  disequilibrium  and  due  to  change  in  gene 
frequencies,  the  genetic  gain  in  the  binary  trait  for  the  first  generation  after  selection,  Gx(i)n» 
is  computed  by  using  the  general  equation  of  gain  prediction  (equation  3-1)  with  g=l,  c=N, 
and 


y(0)N  ~ N(P(0)n»  *^^y(0)) 
y(i)N  ~ N(P(|)n,  cy^ y(0)) 


(3-2) 


Z(0)N  ~ [T  - P(0)N]'<^y(0) 

Z(1)N  ~Z(o)N  - [i(0)*h^y(0)] 


(3-3) 

(3-4) 


Equations  3-3  and  3-4  are  obtained  by  expressing  the  threshold  value,  T,  as  a standard 
normal  deviate  for  the  base  and  first  generation  populations,  respectively.  Equation  3-4  is 
derived  as  follows 


Z(1)N  “ [T  ■ ft(l)N]'Cfy(0) 


(3-5) 


but. 


l^(i)N  ~ ft(o)N  Gy(i)N;  and 


Gy(i)N  = i(o)*h^(o)*^Jy(o)>  for  mass  selection 


(3-6) 

(3-7) 


Substituting  equations  3-6  and  3-7  in  equation  3-5 


Z(1)N  ~ [(T  - P(1)N)/Oy(0)]  - [Gy(i)N/Oy(0)]  — Z(0)N  ■ [i(0)*h^y(0)l 


(3-8) 


For  the  second  generation,  the  gain  ignoring  linkage  disequilibrium,  Gx(2)n>  is 


computed  using  the  same  principles  with  g=2,  c=N,  and 


Y(2)N  ~ N(^(2)N5  <^^y(0))5 

Z(2)N  “ Z(o)N  - [i(0)  A(0)  / (®^A(0)  ■*" 
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(3-9) 


with  y(,)N  and  Z(,)n,  as  in  equations  3-2  and  3-4,  respectively.  Equation  3-9  is  derived  as 
follows 

Z(2)N  ~ [T  - P(2)N]'^y(0) 

but, 

l^(2)N  “ M’(1)N  Gy(2)N> 

Gy(2)N  = i(i)*h^y(0)*<^y(0)  (assumed  no  change  in  h^y(o)  and  Oy(o)) 

then, 

Z(2)N  “ [(T  - P(|)N)/*^y(0)]  ■ [Gy(i)N/<Jy(0)] 

~ Z(i)N  - [i(l)*h^  y(0)]  (3"10) 

Substituting  equation  3-8  in  3-10 

Z(2)N  “ Z(o)N  ■ [i(0)  '*"i(l)]*h^y(0) 

but, 

b^y(O)  “ ® A(o/<^^y(0) 

o^y(o)  = o^A(o)  + (assuming  constant  through  generations) 

then, 

Z(2)N  ~ Z(0)N  ■ [i(0)  A(0)  A{0)  ®^e)] 

Generalizing  for  the  g*  generation,  the  gain  in  the  binary  scale  assuming  no  linkage 
disequilibrium,  Gx(g)N,  is  computed  by 

T T 

^x(g)N  “ l^(y(g)N)<^y(g)N " |f(y(g-i)N)^y(g-i)N 


- 00 


- 00 


(3-11) 


^x(g)c=  jf(z)dz-  |f(z)dz 

- 00  - 00 


y(g)N  ~ N(M(g)N5  <^^y(0)) 


g-1 


^(g)N  “ Z(0)N  - h^(0)  X i(g) 


Z(g)N  “ Z(0)N  “ 


'(g)N 


j=0 


(0)N  " L'^^A(O)  / (^^A(0) 


g-1 

<j"e)]  E 

j=0 


i(g),  for  g k 1 , and 


~ Z(o)N5  for  g ~ Oj 


G X (g)  N is  the  genetic  gain  in  the  binary  trait  in  the  g*  generation,  ignoring  linkage 
disequilibrium; 

f(y(g)N)  is  the  normal  probability  density  function  for  the  g*  generation  population, 
ignoring  linkage  disequilibrium,  with  y(g)N  ~ N(p(g)N,  o^y(o)); 

P(g)N  is  the  mean  of  the  g***  generation  population,  ignoring  linkage  disequilibrium; 

Z(g)N  is  the  standardized  normal  deviate  for  the  threshold  in  the  g*  generation, 
ignoring  linkage  disequilibrium; 

o^y(o)  is  the  phenotypic  variance  of  the  underlying  continuous  trait  in  the  base 
population  (generation  0); 

o^A(o)  is  the  additive  variance  in  the  in  the  base  population; 

Is  the  environmental  and  non-additive  genetic  variance  (assumed  constant  across 


generations); 
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Gy(g)N  is  the  genetic  gain  in  underlying  scale  for  the  g“’  generation,  ignoring  linkage 
disequilibrium;  and 

h^y(o)  is  the  individual  heritability  on  of  the  underlying  trait  in  the  base  population. 

Gain  in  the  Binary  Trait  when  the  Mean  of  the  Underlying  Trait  is  Affected  bv  Linkage 
Disequilibrium  [Gx(g)M] 

Reduction  in  the  additive  variance  of  the  underlying  trait  reduces  the  heritability  and 
the  phenotypic  variance  in  the  current  generation,  reducing  the  gain  for  the  trait  in  the 
following  generation.  This  causes  the  frequency  of  the  desirable  binary  phenotype  in  the 
following  generation  to  be  smaller  than  that  expected  ignoring  linkage  disequilibrium. 
Considering  the  effect  of  linkage  only  on  the  mean  (i.e.,  the  effect  on  spread  is  ignored),  the 
reduced  gain  in  the  underlying  trait  occurs  in  the  second  generation.  This  is  illustrated  in 
Figure  3 - 1 . When  linkage  disequilibrium  is  ignored,  first  and  second  generations  gain  in  the 
underlying  trait  are  equal  (Gy(i)N  = Gy(2)N).  However,  when  linkage  is  assumed  to  affect  the 
mean  of  the  distribution,  the  gain  in  the  second  generation  is  smaller  than  that  in  the  first 

(Gy(2)M  ^ Gy(l)N)- 

For  the  first  generation  after  selection,  the  genetic  gain  in  the  binary  trait  assuming 
linkage  disequilibrium  affects  only  the  mean,  Gx(i)m>  is  the  same  as  that  when  linkage 
disequilibrium  is  ignored  completely.  However,  in  the  second  generation  the  gain  Gx(2)m 
differs  from  that  obtained  in  the  absence  of  linkage.  In  this  case,  equation  3-1  is  used  with 


g=2,  c=M,  and 
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X(2)M  ~ N(fi(2)M>  ®^y(0)) 

Z(2)M  “ [T  - M’(2)M]/CJy(0)  ~ Z(o)n  " i(0)*h^y(0)*^y(o/^y(0)  " i(l)*^^y(l)*^y(l/*^y(0)  (3“12) 

y(i)N  and  Z(i)n  are  the  same  as  in  equations  3-2  and  3-4,  respectively.  Equation  3-12  is 
obtained  by 


Z(2)M  ~ [T  - P(2)M]'<^y(0) 


but 


M^(2)M  “ M’(1)N  Gy(2)M 


Gy(2)M  - i(l)*h^y(l)*®y(l) 


then 


Z(2)M  ~ [(T  - P(l)N)'<^y(0)]  ■ [Gy(2)M'<^y(0)] 

“ Z(|)N  - [i(l)*h^y(l)*Oy(l/<Jy(0)] 

Substituting  equation  3-4  in  3-13 

Z(2)M  ~ Z(o)N  ■ [i(0)*h^y(0)*f^y(0)  /®y(0)]  “ [i(l)*h^y(l)**^y(l/®y(0)] 


(3-13) 


Generalizing,  the  genetic  gain  in  the  binary  trait  for  the  g‘^  generation  assuming 
linkage  disequilibrium  affects  only  the  mean,  Gx(g)M,  is  given  by 


i 1 

^x(g)M  = lf(y(g)M)dy(giM-  Jf(y(,.i)M)dy  (g-1) M 

- 00  - 00 

^(g)M  ^(g-l)M 

= I f(z)dz  - I f(z)dz 


(3-14) 


- 00 


- 00 


with 


Y(g)M  ~ N(M’(g)M>  <^^y(0)) 
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g-i 

Z(g)M  = Z(0)N  - (l/Oy(0))  X *(i)*^^(j)*’^y(i) 

j=o 

I 8"*  i 

/ 2 2 2/2  2 

Z(g)M  = Z(0)N  -[1/V<^A(0)+  CTeIZ  [i(i)*(JAa)W^Aa)+  CTeI,  for  g ^1,  and  (3-15) 

j=0 

Z(g)N  “ Z(o)N»  for  g = 0. 

Equation  3-15  is  obtained  by  considering 

h^(i)  ^ ®^A(j/o^(j) 

*^^y0)  “ ^^A(j)  o^E  (assuming  environmental  variance  constant). 

The  additive  variance  in  the  g‘*’  generation  of  selection,  for  the  underlying  trait,  can 
be  computed  by  the  recurrence  formula  suggested  by  Bulmer  (1971) 

o^A(g)  = 0.5*  { 1 - i(g-,)*[i(g-i)  - Z(g-i)c]*h2y(g.i)}*oVi)  + 0.5*0%)  (3-16) 

where 

Gx(g)M  is  the  genetic  gain  in  the  binary  trait  for  the  g*  generation,  assuming  the  mean 
of  the  distribution  of  the  underlying  trait  is  affected  by  linkage  disequilibrium; 
Z(g)M  is  the  standardized  normal  deviate  for  the  threshold  in  the  g*  generation,  with 
linkage  disequilibrium  affecting  only  the  mean  of  the  underlying  trait; 
f(Y(g)M)  is  the  normal  probability  density  function  for  the  g"*  generation  population, 
with  linkage  disequilibrium  affecting  only  the  mean  of  the  underlying  trait, 
with  y(g)M  ~ N(p(g)M,  o^y(o));  and 


i(g)  is  the  selection  intensity  in  the  g*  generation. 
Other  terms  are  as  previously  defined. 
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Gain  in  the  Binary  Trait  when  both  the  Mean  and  the  Spread  of  the  Distribution  of  the 
Underlying  Trait  are  Affected  bv  Linkage  Disequilibrium  [Gx(g)MK] 

Reduction  in  the  additive  variance  of  the  underlying  trait  by  selection  has  two  effects 
on  the  gain  of  the  binary  trait.  First,  as  discussed  above,  it  reduces  the  gain  in  the  underlying 
trait,  reducing  the  gain  in  the  binary  trait.  Second,  reducing  the  variance  in  the  progeny 
results  in  a distribution  of  the  progeny  that  is  less  spread  (Compare  Y(,)k  vs.  Y(,)n  and  Y(2)mk 
vs.  Y(2)m»  in  Figure  3-1).  This  causes  the  frequency  of  the  binary  phenotype  to  be  larger  or 
smaller  (depending  on  the  original  frequency  of  the  binary  trait)  than  that  expected  when  no 
reduction  in  the  variance  occurs.  As  mentioned  before,  reduction  in  the  gain  of  the 
imderlying  trait,  as  a result  of  linkage  disequilibrium,  occurs  only  in  the  second  generation 
after  selection  starts.  On  the  other  hand,  the  effect  of  linkage  disequilibrium  on  the  binary 
gain  through  its  effect  on  the  spread  occurs  in  the  first  generation  after  selection. 

In  the  first  generation,  only  the  effect  on  spread  occurs  and  the  gain  in  the  binary 
scale,  Gx(i)mk,  is  computed  by  equation  3-1  using  the  parameters 


y(0)N  N(|i(0)N?  ^\(0)) 

y(l)K 

(3-17) 

Z(0)N  ~ [T  “ M-(0)N]/^y(0) 

(3-18) 

Z(l)K  ” [Z(0)N  " i(0)*h^y(0)K^y(o/^y(l)) 

(3-19) 

Equations  3-18  and  3-19  are  obtained  by  expressing  the  threshold  value,  T,  as  a 
standard  normal  deviate  for  the  base  and  progeny  populations,  respectively.  Equation  3-19 
is  obtained  by 

Z(1)K  ~ [T  - P(l)K]'<^y(l) 


but, 
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Z(0)N  “ [T  - M’(0)N]'*^y(0) 
l^(l)K  ~ M'(0)N  Gy(l)K5 

Gy(i)K  = i(o)*h^y(o)*<Jy(o)>  for  muss  selection 

then, 


Z(|)K  ~ [T  - (A(O)K  ■ Gy(i)K]/CTy(i) 

“ {[T  ■ M-(0)K]foy(l)}  ■ {Gy(l)Kfoy(l)} 

“ [2^(0)N  ■ i(0)*h  y(0)](r^y(o/<^y(l)) 

In  the  second  generation,  both  the  effect  on  mean  and  on  spread  occur,  then  the 
genetic  gain  in  the  binary  scale,  Gx(2)mk>  is  computed  by  equation  3-1  using  the  parameters 


y(2)MK  ~ N(|r(2)MK»  ^^y(2)) 


'(2)MK 


= [!/■/ 


2 2 I 2 i" 

CT  A(2)~^  O' £^{^(0)^1  O'  O 


A(0) 


X t^G)  O A(,)^  ^jo  A(j)~^0  r] 


j=0 


with  y(i)K  and  Z(i)k  the  same  as  in  equations  3-17  and  3-19,  respectively.  Z(2)mk  is  obtained 


by 


Z(2)MK  “ [T  - M'(2)MK]/^y(2) 


Z(l)K  “ [Z(0)N  “ i(0)*h^y(0)K^y(o/^y(l)) 

^(2)1^  ~ M'COK  Gy(2)MK? 

Gy(2)MK  “ i(i)*h^y(i)*^y(i)* 


then 


Z(2)MK  “ {[T  - |^(l)K]/^y(2)}  ‘ {Gy(2)MK/^y(2)} 
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= [Z(l)K*tJy(l)  - i(l)*h^y(l)*<Jy(l)]/<Jy(2) 

~ (^y(l/^y(2))*[(<^y(o/^y(l))(Z(o)N  " i(0)*h^y(0))  " i(l)*h^y(l)] 

“ (^y(o/^y(2))*  {[(Z(0)N*<^y(0)y®y(0)]"i(0)*(<^^A(o/t^y(0))"  i(l)*(^  A(l)*^y(l)y(<^Vl)*®y(0))} 


(l/ay(2))*{Z(0)*Cy(0)  - i(0)*[<T  A(o/<^y(0)]  ■ i(l)*[^  A(l)*^y(l)]} 

1 

(l/ay(2))*  {Z(0)N*<Jy(0)  - X *(j)*‘^^A(j/^y(j) 

j=0 


but, 


^ y(g)  ^ A(g)  ^ I 


then, 


'(2)MK 


=[i/V^l(2)+o-E]{Z(0)^crI(0)+crE  -'Ll^(j)(T\u)^^|crm+crl] 


j=0 


Generalizing  for  the  g*  generation,  the  genetic  gain  in  the  binary  scale  considering 


the  effects  of  linkage  on  the  mean  and  spread,  (g)  mk>  is  computed  by 


1 1 

^x(g)MK  = Jf(y  (g)MK  (g)MK  ■ - |f(y  (g-l)MK)^y(g-l)MK 

- 00  - 00 

^(g)MK  ^(g-l)MK 

= j f(z)dz  - I f(z)dz 


- 00 


- 00 


with 


y(g)MK  ~ N(P(g)MK5  y(g)) 


(3-20) 


g-1 


Z(g)MK  = [1  / ~ £ t'o)  , for  g ^ l , and 

j=0 


'(g)N 


= Z(0)N,  for  g = 0, 
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where 

G X (g)  MK  is  the  genetic  gain  in  the  binary  trait  for  the  g*  generation,  when  both  the 
mean  and  the  spread  of  the  distribution  of  the  underlying  trait  are  affected  by 
linkage  disequilibrium; 

Z(i)K  is  the  standardized  normal  deviate  for  the  threshold  in  the  first  generation,  with 
linkage  disequilibrium  affecting  only  the  spread  of  the  imderlying  trait; 

f(y(i)ic)  is  the  normal  probability  density  function  for  the  first  generation  population, 
with  linkage  disequilibrium  affecting  only  the  spread  of  the  underlying  trait, 
with  y(,)K  ~ N(P(,)k,  o2y(,)); 

Z(g)MK  is  the  standardized  normal  deviate  for  the  threshold  in  the  g“’  generation,  when 
both  the  meein  and  the  spread  of  the  underlying  trait  are  affected  by  linkage 
disequilibrium;  and 

f(y(g)MK)  is  the  normal  probability  density  function  for  the  g***  generation  population, 
with  linkage  disequilibrium  affecting  both  the  mean  and  spread  of  the 
underlying  trait,  with  y(g)MK  ~ N(P(g)MK,  o^y(g)). 

Other  terms  are  as  previously  defined. 

Numerical  Evaluation  of  the  Effect  of  Linkage  Disequilibrium 
Effect  of  T.inkage  Disequilibrium  on  the  Short-Term  Genetic  Gain  of  Binary  Traits 


Genetic  gain  for  the  first  generation  after  selection  is  the  gain  most  often  predicted 
in  plant  arid  animal  breeding.  Therefore,  it  is  of  interest  to  quantify  the  effect  of  linkage 
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disequilibrium  on  this  gain.  As  said  before,  the  only  effect  of  disequilibrium  on  the  gain  of 
binary  traits  in  the  first  generation  is  that  through  the  reduced  spread  of  the  distribution 
(direct  effect  of  the  reduced  additive  variance). 

In  this  section,  the  percent  contribution  of  the  effect  of  linkage  disequilibrium  to  the 
total  gain  in  the  binary  trait  was  evaluated  for  the  first  generation  of  selection.  This 
contribution,  pGx(i)K,  was  computed  by 

pGx(i)K  ~ [(Gx(i)mk  ■ Gx(l ) n)/Gx(  1 )mk]  * 1 00 

The  percent  contribution,  pGx(i)K,  was  computed  for  four  heritabilities  (0.1, 0.3, 0.5, 
and  0.9)  and  18  incidences  of  the  binary  trait  (5, 10, 15, 20, 25, 35, 40, 45, 50, 55, 60, 65, 70, 
75,  80,  85, 90,  and  95%).  In  the  computation  of  Gx(|)mk  (equation  3-20)  and  Gx(1)n  (equation 
3-11)  the  areas  under  the  normal  curves  were  computed  using  the  SAS®  function 
PROBNORM  and  the  values  computed  using  the  fimction  PROBIT  (SAS  Institute, 
1990). 

The  effect  of  linkage  disequilibrium  to  the  total  gain  through  its  effect  on  the  spread 
of  the  distribution  is  presented  in  Figure  3-2.  Overall,  the  contribution  of  spread  increases 
with  increasing  heritability  and  incidence  of  the  desirable  binary  phenotype  (Figure  3-2a). 
When  the  additive  variance  is  large  (high  heritability),  the  absolute  reduction  in  this  variance 
is  large  (see  equation  3-16),  resulting  in  a strong  effect  of  the  spread  of  the  distribution  on 
the  total  gain.  However,  the  reduction  in  additive  variance  does  not  affect  the  portion  of  the 
binary  gain  that  ignores  linkage  disequilibrium,  Gx(1)n.  With  a constant  gain  ignoring  linkage 
disequilibrium  and  an  increasing  effect  of  spread  with  heritability,  the  contribution  of  spread 


increases. 


53 


o o o o o o o 

m CO  cNj  T-  'f- 

I 


(%)  »1^(1.)X9 


(%)  >l(l.)xod 


Figure  3 -2.  Effect  of  linkage  disequilibrium  on  the  binary  gain  through  change  in  spread  of  the 
distribution,  for  five  levels  of  heritability  [h2]  and  several  incidences  of  the  desirable  phenotype  in 
the  base  population,  for  the  first  generation  of  selection:  a)  percent  contribution  of  the  spread 
[pGx(l)K];  and  b)  total  gain  on  binary  scale  [Gx(l)MK]. 
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On  the  contreiry,  increasing  the  selection  intensity  (reducing  the  incidence  of  the 
desirable  phenotype)  reduces  the  contribution  of  the  spread  to  the  total  gain.  Increasing  the 
selection  intensity,  increases  the  gain  of  the  binary  trait  ignoring  linkage  disequilibrium, 
Gx(i)n>  since  the  gain  for  the  underlying  trait  increases,  Gy(,)N.  However,  increasing  the 
selection  intensity  also  reduces  the  additive  variance  (spread)  (equation  3-16)  and  then 
increases  the  contribution  of  spread.  However,  the  gain  ignoring  linkage  increases  faster, 
becoming  a larger  portion  of  the  total  gain,  reducing  the  contribution  of  the  spread.. 

Overall  the  contribution  of  spread  is  maximum  when  the  total  gain  in  the  binary  trait 
is  minimum  (Figures  3-2).  Under  a high  incidence  of  the  desirable  phenotype,  the 
contribution  of  spread  to  the  total  gain  may  be  as  high  as  82%,  for  all  heritabilities.  However, 
the  total  gain  at  high  incidences  of  the  desirable  trait  is  not  very  high,  since  the  selection 
intensity  in  this  situation  is  low  (Figure  3-2b). 

For  heritabilities  between  0. 1 and  0.7  and  low  incidences  of  the  desirable  phenotype, 
the  contribution  of  spread  is  negative,  indicating  that  for  these  combinations  the  gain  in  the 
binary  trait  is  decreased  by  spread  caused  by  linkage  disequilibrium.  Therefore,  at  least 
theoretically,  successive  generations  of  random  mating  before  deployment  would  decrease 
the  effect  of  linkage  disequilibrium,  increasing  the  gain,  for  these  combinations  of 
heritability  and  incidences. 

For  continuous  traits,  it  is  reasonable  to  ignore  linkage  disequilibrium  in  the  first 
generation  after  selection,  since  the  effect  of  disequilibrium  for  these  traits  occurs  only  at  the 
second  generation  after  selection.  However,  for  binary  threshold  traits  the  effect  of  linkage 
disequilibrium  occurs  since  the  first  generation  after  selection  and,  depending  on  the  true 
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underlying  heritability  and  the  incidence  of  the  trait,  this  effect  may  represent  an  important 
portion  of  the  total  first  generation  gain. 

Effect  of  Linkage  Disequilibrium  on  the  Long-Term  Genetic  Gain  of  Binary  Traits 

The  effect  of  linkage  disequilibrium,  through  its  effects  on  the  mean  and  the  spread 
of  the  distribution  of  the  underlying  trait,  was  also  evaluated  for  the  long-term  prediction  of 
gain  in  binary  traits.  Genetic  gains  for  the  binary  trait  were  predicted  in  the  absence  of 
linkage  disequilibrium  [Gx(g)N,  equation  3-11],  in  the  presence  of  linkage  disequilibrium 
affecting  only  the  mean  of  the  distribution  [Gx(g)M,  equation  3-14],  and  in  the  presence  of 
disequilibrium  affecting  both  the  mean  and  the  spread  of  the  distribution  of  the  underlying 
trait  [Gx(g)MK»  equation  3-20].  In  the  prediction  of  the  two  last  types  of  gain,  the  additive 
variance  in  a given  generation  was  computed  by  the  formula  suggested  by  Bulmer  (1971) 
and  presented  as  equation  3-16.  Gains  were  predicted  for  three  levels  of  initial  heritabilities 
of  the  imderlying  trait  (0.1, 0.5, 0.9)  and  four  initial  incidences  of  the  binary  trait  (5, 25, 50, 
95%),  for  20  generations  of  selection  on  the  binary  trait.  The  results  are  presented  in  Figure 
3-3. 

Overall,  the  gain  predicted  for  the  binary  trait  considering  the  effect  of  disequilibrium 
only  on  the  mean,  Gx(g)M,  is  close  to  that  considering  the  effect  of  disequilibrium  on  both  the 
mean  and  the  spread,  Gx(g)MK  (Figure  3-3).  But  they  both  differ  of  the  gain  ignoring  linkage 
disequilibrium,  Gx(g)N.  This  trend  is  stronger  for  traits  of  low  heritability  with  a low  initial 
incidence  of  the  desirable  phenotype,  and  in  the  first  1 0 generations.  There  is  a tendency  for 
the  binary  gain  under  the  effects  of  linkage  disequilibrium  to  be  smaller  than  that  ignoring 
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linkage  at  incidences  smaller  than  50%  and  be  larger  at  incidences  larger  than  50%. 
Therefore,  for  low  initial  incidences  (5  and  25%)  the  gain  including  the  effects  of 
disequilibrium  is  smaller  than  that  ignoring  linkage  disequilibrium  in  earlier  generations,  but 
larger  at  later  generations  when  the  incidence  in  the  population  is  greater  than  50%.  In 
summary,  gain  predicted  ignoring  is  biased  downward  for  incidences  smellier  than  50%  and 
upward  for  incidences  smaller  than  50%. 

Effect  of  Linkage  Disequilibrium  on  the  Heritabilitv  of  the  Underlying  Trait 

The  long-term  effect  of  linkage  disequilibrium  on  the  heritability  of  the  underlying 
trait  is  of  interest  for  two  reasons.  First,  many  researchers  transform  their  estimates  of 
heritability  of  binary  traits  to  an  underlying  continuous  scale  (Dempster  and  Lemer  1950; 
Gianola  1 982).  Second,  for  binary  threshold  traits  the  selection  intensity  is  limited  by  the 
frequency  of  the  desirable  phenotype  in  the  population.  Therefore,  the  effect  of  linkage 
disequilibrium  is  expected  to  be  different  from  that  on  continuous  traits  where  a fixed 
selection  intensity  can  be  assumed  across  generations  (Roff  1994a). 

The  effect  of  linkage  disequilibrium  considering  the  effect  on  both  the  mean  and  the 
spread  of  the  distribution  of  the  underlying  trait  was  computed  for  three  levels  of  initial  true 
heritabilities  of  the  underlying  trait  (0. 1 , 0.5, 0.9)  and  four  initial  incidences  of  the  favorable 
binary  phenotype  (5, 25,  50, 95%),  for  20  generations  of  selection  on  the  binary  trait.  The 
results  are  presented  in  Figure  3-4. 

Overall,  the  heritability  initially  decreases,  but  then  it  increases  again  reaching  the 
initial  values.  This  is  in  agreement  with  the  results  presented  by  Roff  (1994a),  considering 
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the  effect  of  linkage  disequilibrium  only  on  the  mean.  When  the  desirable  phenotype  occurs 
in  a high  frequency  in  the  base  population  (e.g.,  95%),  the  reduction  in  heritability  is  very 
small  or  absent  depending  on  the  true  heritability.  In  situations  of  high  incidences,  the 
individuals  in  the  population  mate  almost  at  random  and  then  linkage  disequilibrium  is  very 
small,  resulting  in  a smaller  effect  on  the  heritability.  On  the  other  hand,  with  low  incidences 
of  the  desirable  phenotype,  the  effect  of  linkage  disequilibrium  is  very  pronounced  with 
considerable  reduction  in  the  heritability  in  the  few  first  generations.  Note  in  the  formula 
suggested  by  Bulmer  (1971),  and  presented  as  equation  3-16  in  this  chapter,  the  reduction 
in  additive  variance  is  proportional  to  the  heritability  of  the  trait  and  the  selection  intensity 
applied. 

For  continuous  traits,  the  additive  variance,  and  then  the  heritability,  decreases  under 
the  effect  of  linkage  disequilibrium  caused  by  selection,  but  after  some  generations  it  reaches 
an  equilibrium  where  the  variance  does  not  change.  For  threshold  traits,  as  shown  in  Figure 
3-4  and  also  observed  by  Roff  (1994a),  after  some  reduction,  the  heritability  increases  again 
reaching  an  equilibrium  close  to  the  original  value.  Therefore,  considering  the  effect  on 
heritability,  the  importance  of  linkage  disequilibrium  is  primarily  for  the  first  1 0 generations 
of  selection,  since  after  this  point  the  binary  gain  predicted  is  close  to  that  predicted  using 
estimates  obtained  from  first  generation  population  and  the  heritability  returns  to  its  initial 


values  (Figure  3-4). 
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Conclusions 

The  genetic  gain  of  binary  threshold  traits  is  affected  by  change  in  both  the  mean  and 
the  spread  (variance)  of  the  distribution  of  the  underlying  trait  caused  by  linkage 
disequilibrium. 

The  effect  of  spread  on  the  first  generation  gain  increases  with  the  heritability  of  the 

V 

trait  and  the  incidence  of  the  desirable  phenotype,  being  maximum  around  39%  when  the 
total  gain  is  minimal. 

Linkage  disequilibrium  can  increase  or  decrease  the  gain  for  binary  traits,  depending 
on  the  initial  incidence  of  the  desirable  phenotype  in  the  population  and  the  heritability  of 
the  trait. 

Linkage  disequilibrium  decreases  the  heritability  of  the  underlying  trait  in  the  first 
three  generations  of  selection,  but  soon  the  heritability  increases  again  returning  to  its  initial 
values. 

The  effect  of  linkage  disequilibrium  on  the  gain  of  binary  traits  occurs  already  in  the 
first  generation  of  selection,  since  the  genetic  gain  for  these  traits  is  not  only  a function  of 
the  parameters  in  the  base  population,  but  also  from  those  in  the  progeny  after  selection.  This 
contrasts  with  the  effect  of  disequilibrium  on  gain  of  continuous  traits,  where  gain  depends 
only  on  parameters  from  a previous  generation  and  then  the  effect  of  linkage  occurs  only 
after  the  second  generation.  Therefore,  formulae  of  prediction  of  gain  for  threshold  traits 
should  take  into  account  the  effect  of  disequilibrium  beginning  with  the  first  generation. 


CHAPTER  4 

PREDICTION  OF  PARENTAL  GENETIC  MERIT  FOR  BINARY  THRESHOLD 
TRAITS  COMBINING  DATA  FROM  ENVIRONMENTS  WITH 

DIFFERENT  INCIDENCES 

Introduction 


An  increase  in  data  from  offspring  of  a parent  increases  the  precision  of  its  breeding 
value  prediction,  and  thus  the  reliability  of  decisions  made  by  the  breeder.  Therefore, 
combining  data  from  many  sources  is  often  desirable  in  a breeding  program.  In  particular, 
for  binary  data  it  is  common  to  combine  data  from  progeny  growing  in  experiments  with 
different  incidences  of  the  desirable  phenotype  into  a single  predicted  breeding  value  for  the 
parent.  Also,  since  the  major  objective  of  prediction  is  to  compare  the  genetic  worth  of  the 
candidates  under  the  same  conditions  (e.g.,  at  a fixed  incidence),  a target  environment  is 
usually  defined.  For  example,  in  forestry  fusiform  rust  data  from  different  locations  with 
different  incidences  are  often  combined  to  predict  breeding  values  for  a single  target 
environment  having  50%  rust  incidence  (White  and  Hodge  1989,  p.l90). 

For  continuous  traits,  the  assumptions  of  normality,  variance  homogeneity,  and 
linearity  between  observations  and  the  values  being  predicted  are  generally  satisfied.  Under 
these  conditions,  information  from  multiple  tests  can  be  combined  into  a single  predicted 
breeding  value  without  major  problems  using  linear  methods.  Under  these  conditions,  best 
linear  unbiased  prediction  has  been  the  method  of  choice  (Mrode  1 996). 
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On  the  other  hand,  binary  data  violate  the  assumptions  above.  First,  binary  data  are 
not  normally  distributed.  Second,  the  variance  for  binary  traits  is  function  of  the  mean  and 
then  violates  the  assumtion  of  variance  homogeneity.Third,  true  (or  predicted)  breeding 
values  for  pairs  of  tests  with  different  mean  incidences  are  not  linearly  related,  making  the 
translation  of  the  information  between  tests  and  also  to  a target  envirorunent  difficult,  as 
shown  in  this  chapter.  In  fact,  Gianola  (1982)  demonstrated  that  for  categorical  data,  best 
linear  prediction  is  not  the  best  predictor. 

Several  alternatives  have  been  suggested  to  predict  breeding  values  for  categorical 
data,  including  binary.  An  obvious  alternative,  because  of  its  excellent  properties  for 
continuous  data,  is  BLUP  applied  to  the  binary  data  as  if  they  were  continuous.  This  method 
has  been  the  most  commonly  used  method  of  breeding  values  prediction  for  binary  data  in 
animal  breeding  (Foulley  and  Im  1989).  A similar,  but  modified  approach  has  been  used  in 
the  Cooperative  of  Forest  Genetics  Research  Program  (CFGRP),  at  University  of  Florida 
(White  and  Hodge  1 996).  In  this  approach,  best  linear  prediction  is  applied  directly  to  binary 
data,  but  the  variances  and  covariances,  computed  fi'om  functions  obtained  from  a large 
binary  trait  data  set  (fusiform  rust  disease)(Dieters  et  al.  1996),  are  allowed  to  vary  for 
environments  of  different  incidence.  This  method  is  currently  in  use  in  the  CFGRP,  but  no 
comparison  of  this  with  other  methods  has  been  attempted. 

A second  alternative  is  the  use  of  methods  based  on  generalized  linear  mixed  models 
assuming  a threshold  model  (Schall  1991;  Littell  et  al.  1996).  These  methods  have  the 
desirable  property  of  considering  the  real  distribution  of  the  data  and  modeling  the  errors 
adequately  (Littell  et  al.  1 996).  However,  this  approach  has  not  been  evaluated  as  a solution 
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for  the  problem  of  different  incidences  of  the  trait  at  different  levels  of  the  fixed  effects  (e.g., 
different  locations). 

A third  alternative  is  the  use  of  data  transformation.  Probit  transformation  eliminates 
some  of  the  problems  of  the  binary  scale  (Robertson  and  Lemer  1949;  Dempster  and  Lemer 
1950).  This  transformation  is  based  on  the  threshold  model  and,  therefore,  is  potentially 
useful  in  combining  data  from  environments  with  different  incidences.  Further,  since  the  data 
are  converted  to  a continuous  scale,  probit  allows  the  conversion  of  breeding  values  obtained 
in  a continuous  scale  to  a binary  scale  with  specific  target  incidences.  Other  transformations 
such  as  arcsin,  logarithm,  and  root  square,  were  shovra  by  Otsuk  (1991)  to  maintain  the 
proportion  between  the  variance  components.  Y et,  the  translation  of  the  predictions  obtained 
in  a transformed  scale  to  a target  incidence  is  problematic.  So,  only  the  probit  transformation 
was  considered  in  this  study. 

Unfortunately  studies  have  not  been  done  comparing  BLUP  (univariate  or 
multivariate),  generalized  mixed  models  and  the  BLP  method  currently  used  in  the  CFGRP, 
when  the  levels  of  the  fixed  effects  (i.e.,  mean  incidence)  are  different.  This  lack  of 
information  motivates  the  present  study. 

The  objective  of  this  study  is  to  compare  several  analytical  techniques  (linear, 
generalized  linear,  univariate  and  bivariate  methods)  for  their  ability  to  accurately  and 
precisely  predict  genetic  gain  of  binary  traits  when  the  data  are  combined  from  two  different 


environments  that  have  different  mean  incidences. 
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Material  and  Methods 

This  study  was  divided  into  two  stages.  In  the  first  stage,  several  methods  of 
prediction  were  compared  using  the  restriction  that  one  of  the  environments  always  had  a 
mean  incidence  equal  to  50%.  In  the  second  stage  only  two  methods  were  used,  representing 
generalized  mixed  models  and  univariate  BLUP  methods.  In  this  second  stage  several 
combinations  of  mean  incidences  in  the  two  environments  were  simulated. 

Data  Simulation 

Three  hundred  progeny  tests  were  simulated  for  each  combination  of  true  parameters. 
Each  sample  had  80  half-sib  families,  6 blocks  and  10  individuals  per  plot,  in  a randomized 
complete  block  design  in  two  environments  (300  samples*2  environments* 80  families*6 
blocks*  1 0 individuals  per  plot = 2,880,000  observations  for  each  set  of  true  parameters).  The 
true  heritability  for  the  underlying  continuous  trait  was  assumed  to  be  the  same  in  both 
environments  and  equal  to  0.35.  This  value  was  used  to  match  those  estimated  for  fusiform 
rust  in  pines:  0.21  to  0.47,  after  transformation  to  underlying  scale  from  estimates  presented 
by  Dieters  et  al.  (1996)  and  Rockwood  and  Goddard  (1973),  respectively.  Genotype-by- 
environment  interaction  on  the  underlying  scale  was  assumed  absent. 

Two  traits  were  simulated  following  the  threshold  model:  an  xmderlying  continuous 
trait  (yijki)  and  a binary  trait  (xykt),  which  depends  on  yy^i  and  the  threshold.  The  model  to 


simulate  the  continuous  trait  was 
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yijki  - ^ + Ej  + Bj(i)  + f^,  + Pijk  + Wjjki  (4- 1 ) 

where 

Yyiii  is  the  underlying  (unobservable)  continuous  trait; 
p is  the  general  mean; 

Ej  is  the  fixed  effect  of  the  i*  environment,  with  i =1,2; 

By  is  the  fixed  effect  of  the  j*  block  in  the  i*  environment,  with  j=l, ...  ,6; 
fk  is  the  random  effect  of  the  k*  half-sib  family,  with  k=l,  . . . , 80  and  4-  N(0,  o^f); 
Pijk  is  the  random  effect  of  the  plot  with  k***  family,  at  the  j*  block,  and  i* 
environment,  with  pyk~N(0,  o^p);  and 

Wjjki  is  the  random  effect  of  the  1*  individual  of  the  ijk*  plot,  with  1=1,  . . . , 1 0 and 
Wijki  ~ N(0,  o\). 

Covariances  among  all  effects  in  the  model  are  assumed  zero. 

The  binary  trait  (xyki)  was  generated  for  each  ykl*  individual  by  imposing  a threshold 
in  the  population  of  yyki’s  as 

Jo  if  yjjki  ^ T (desirable  phenotype) 

^ijki  “ 1 if  y..^  > T (undesirable  phenotype) 

Different  threshold  values  were  imposed  on  the  distribution  of  the  continuous  trait 
to  simulate  different  incidences  of  the  imdesirable  phenotype.  Many  authors  express  their 
results  as  function  of  the  incidence  of  the  desirable  phenotype.  Here  the  undesirable 
phenotype  was  chosen  to  be  consistent  with  the  terminology  used  in  disease  resistance,  that 
is,  high  incidence  of  the  undesirable  phenotype  meaning  a high  disease  incidence  in  the 


population. 


66 


In  the  first  stage  of  this  study,  the  incidence  in  one  environment  was  kept  fixed  at 
50%,  while  the  other  environment  had  incidences  of  1 0,  50,  or  90%.  In  this  stage  many 
methods  of  prediction  are  compared.  In  the  second  stage,  all  1 5 possible  combinations 
between  the  incidences  10,  30,  50,  70,  and  90%,  without  reciprocals,  were  simulated. 
However,  only  two  methods  of  analysis  were  compared. 

Methods  of  Genetic  Gain  Prediction 

In  the  first  stage,  nine  methods  were  compared.  These  methods  combine  different 
types  of  data  and  analytical  techniques  to  form  the  following  treatments: 

I - Univariate  Generalized  Linear  Mixed  Model  (GLIMM)  (Schall  1991 ; Littell 

et  al.  1996); 

II  - Best  Linear  Prediction  (BLP)  method  using  a function  to  estimate  parameters 

in  environments  with  different  incidences  (White  and  Hodge  1996); 

III  - Univariate  Best  Linear  Unbiased  Prediction  (BLUP)  on  binary  data 

analyzed  separately  by  site; 

IV  - Univariate  BLUP  combining  binary  data  across  two  sites; 

V - Univariate  BLUP  combining  plot  means  across  two  sites; 

VI  - Univariate  BLUP  combining  probit  transformed  data  across  two  sites; 

VII  - Bivariate  BLUP  on  binary  data; 

VIII  - Bivariate  BLUP  on  plot  means;  and 

IX  - Bivariate  BLUP  on  probit  transformed  data  across  two  sites. 
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For  the  BLP  method  (II),  a software  written  in  PROC  IML  from  SAS®  was  used 
(White  and  Hodge  1989).  For  all  other  methods  the  software  ASREML  was  used  (Gilmour 
et  al.  1997). 

In  method  I,  general  combining  abilities  (GCA)  for  the  80  parents  were  predicted  on 
the  underlying  scale  using  a univariate  generalized  mixed  model  approach  with  probit  as  the 
link  function  and  assuming  binomial  errors.  The  algorithm  was  that  presented  by  Schall 
(1991)  and  implemented  in  the  software  ASREML  (Gilmour  et  al.  1997).  Simultaneously, 
variance  components  were  estimated  in  that  same  underlying  scale.  The  GCA's  on  the 
underlying  scale  were  converted  to  a binary  scale  for  a target  environment  having  an 
incidence  equal  to  the  average  incidence  of  the  two  environments  using  the  equation 

T 

gx(p)i  = jf(y(w))dy(w)  = |f(z)dz  (4-2) 

- 00  - 00 

with 

y(w)  - N(gyi,  j 

y(0)N  ~ N(P(0)N5  <^^y(0)) 

Zw  = [Tp  - gyi]/0„ 

Tp=  M(0)n  Zx*cTy(o) 

where 

gx(p)i  is  the  general  combining  ability  (GCA)  in  the  binary  trait  for  the  i*  parent,  in 
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a target  environment  whose  population  has  an  incidence  of  the  undesirable 
phenotype  equal  to  p percent; 

f(y(w))  is  the  density  function  of  the  phenotypic  values  within  a half-sib  family  for  the 
underlying  scale,  with  y(„j  ~ N(gyj, 

gyi  is  the  predicted  general  combining  ability  on  the  underlying  scale; 

y(o)N  is  the  underlying  continuous  variable,  with  y(o>N  ~ N(P(o)n>  «J^(o)); 

is  the  standard  normal  deviate  for  the  threshold  relative  to  the  distribution  of 
phenotypic  values  within  families; 

Zj  is  the  standard  normal  deviate  for  the  threshold  relative  to  the  distribution  of 
phenotypic  values  in  the  whole  population; 

Tp  is  the  threshold  value  for  a population  with  p percent  of  the  undesirable  phenotype; 

P(o)N  is  the  population  mean,  assumed  zero  here; 

a^y(o)  is  the  estimate  of  the  phenotypic  variance  in  the  population  on  the  underlying 
scale; 

is  the  estimate  of  the  GCA  variance  component  on  the  underlying  scale; 

o^fe  is  the  estimate  of  variance  component  for  GCA-by-environment  interaction; 

o^p  is  the  estimate  of  variance  component  for  GCA-by-block  interaction;  and 
is  the  estimate  of  the  within  family  variance  component. 

In  method  II,  general  combining  abilities  on  a binary  scale  were  predicted  by  the  BLP 
method  as  used  with  fusiform  rust  data  by  White  and  Hodge  (1996)  for  a target  incidence  of 


50%  by 

g = X + C’V-'(y  - a) 


(4-3) 
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where 

g is  a vector  of  predicted  GCA's; 

I is  the  vector  of  expected  values  of  the  true  GCA's; 

C is  a variance-covariance  matrix  between  the  observed  data  and  the  true  GCA 
values; 

V is  a variance  covariance  matrix  between  the  observed  data; 
y is  the  vector  of  observed  data;  and 
a is  the  vector  of  estimated  fixed  effects. 

In  this  study  the  vector  y has  the  family  means  (incidences)  and  the  vector  a the  test 
means.  The  elements  of  the  vector  x were  all  zero,  assuming  all  parents  come  from  the  same 
population  and  the  GCA's  are  expressed  as  deviations  from  the  mean  of  that  population.  As 
fiilly  detailed  in  White  and  Hodge  (1996),  the  elements  of  the  V matrix  were  computed  as: 
Diagonal:  Vj;  = + 0^^  +a^p/b  + [o^V(b*n)],  and 

Off-Diagonal:  Vy  = (l/4)*OA(jj). 

The  elements  for  the  C matrix  were  computed  as 

Cjj  = (l/4)*OA(ij) 

with 

o\=a^A/4 

= o V4 


= io\{0)  - /(I  + RVw) 
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a^A  = h\*a2y(0)*rBg 

= (h\*(j2y(o))  - o\ 

«^A(ij)  = TBg*  [(a^Ai  + O^Ei)*(CT^Aj  + <y^AEj)]'^^ 

0^(0)  = P*(l-P) 

h\  = [f»h2b(i)  + 50*h\J/[f+50] 

iBg  = 0.405322+0.8709*(pi+Pj)-0.3162*(pi+Pj)2  - 0.7124*(pj-pj)  (4-4) 

h^b(i)  = 0.00156  + 1.295 V^-  1.115  V (4-5) 

where 

h^b  is  the  weighted  average  of  heritabilities  estimated  from  the  test  and  predicted 
based  on  the  mean  incidence  level  level; 
h^b(i)  is  the  single-site  (biased)  heritability  estimated  for  the  j“’  environment; 
h\s  is  the  predicted  single-site,  biased  heritability  (standard  h^  for  that  mean  incidence 
level); 

f is  the  number  of  families  in  the  test  (here  f=80); 
p is  the  mean  incidence  in  the  test; 

Pi  & Pj  is  the  mean  incidences  in  tests  i and  j,  respectively; 

a^y(o)  is  the  phenotypic  variance; 

rBg  is  the  type  B genetic  correlation; 

a^A  is  the  additive  variance; 

o^ae  is  the  additive  by  environment  component; 

a^f  is  the  GCA  variance  component; 

a^AE  is  the  additive  by  environment  variance  component; 
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o^fe  is  the  GCA  by  environment  variance  component; 

is  the  proportion  of  plot  to  within-plot  variance  component; 

a^p  is  the  plot  variance  (block  by  family)  component; 
is  the  within  plot  error  variance; 

OA(i j)  is  the  additive  covariance  between  tests  i and  j . 

Equations  4-4  and  4-5  were  obtained  by  Dieters  et  al.  ( 1 996)  by  fitting  the  estimates 
of  heritability  (h^b(i))  type  B genetic  correlation  (rgg)  to  the  incidence  in  the  tests  (p,  Pi, 
and  Pj)  using  a large  data  set  of  fusiform  rust  disease.  Since  those  equations  were  based  on 
real  data,  they  may  not  fit  well  to  those  of  the  simulations.  Therefore,  comparisons  of  this 
method  with  others  may  not  be  valid. 

In  method  III  (univariate  BLUP  on  binary  data  per  site),  GCA's  were  predicted  by 
BLUP  applied  to  single  site  binary  (0/1)  data  as  if  they  were  continuous.  Two  GCA 
predictions  were  obtained  per  parent,  one  for  each  site.  The  comparison  criteria,  described 
below,  were  applied  to  GCA’s  from  each  site  individually. 

In  method  IV  (univariate  BLUP  on  binary  data  across  sites),  GCA’s  were  predicted 
by  BLUP  applied  to  the  binary  data  across  sites  as  would  be  common  for  continuous 
variables.  The  approach  used  was  a univariate  analysis  with  the  data  of  the  two  environments 
treated  as  a single  trait.  The  model  used  was  one  similar  to  that  in  4- 1 , except  that  here  the 
observations  (yy^i)  were  0/1  instead  of  continuous  data.  A single  GCA  for  each  of  the  80 
parents  was  obtained  by  this  method. 

In  methods  V (imivariate  BLUP  on  plot  means)  and  VI  (univariate  BLUP  on  probit 
transformed  data),  the  same  approach  as  method  IV  was  used,  except  that  instead  of 
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individual  (0/1)  data,  plot  means  were  used.  In  method  V the  raw  plot  means  were  used,  and 
in  method  VI  those  means  were  transformed  by  probit  before  analysis.  In  method  VI,  the 
predictions  obtained  in  the  continuous  scale  were  transformed  to  the  binary  scale  with  the 
50%  incidence  specified  for  the  target  environment,  in  the  same  manner  as  for  method  I 
(equation  4-2). 

In  methods  VII  (bivariate  BLUP  on  binary  data),  VIII  (bivariate  BLUP  on  plot 
means),  and  IX  (bivariate  BLUP  on  probit  transformed  data)  GCA's  were  predicted  using  a 
bivariate  approach,  with  each  environment  treated  as  a different  trait  or  variable.  Therefore, 
two  GCA's  per  parent  were  obtained,  one  for  each  environment.  In  method  VII  binary  data 
were  used.  In  method  VIII  the  raw  plot  means  were  used,  and  in  method  IX  those  means 
were  transformed  by  probit  before  analysis. 

The  true  GCA’s  were  computed  using  the  method  described  in  equation  3-11.  The 
true  variance  components  on  the  underlying  scale  were  used  in  that  equation.  The  true 
GCA’s  were  adjusted  for  the  sample  mean  before  the  criteria  for  comparing  methods  were 
applied. 

In  the  second  stage,  only  methods  I and  IV  were  used,  representing  generalized  mixed 
models  and  univariate  BLUP  methods,  respectively.  These  two  methods  were  chosen 
because  of  their  good  performance  in  the  first  stage.  Besides  that,  method  IV  is  the  most 
commonly  used  BLUP  method  for  the  prediction  of  breeding  values  for  threshold  traits 
(Foulley  and  Im  1989)  and  method  I presents  the  desirable  characteristic  of  allowing 
comparisons  among  values  obtained  from  tests  with  different  incidences  at  the  same  basis. 
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Criteria  to  Compare  Methods 

Three  criteria  were  used  to  compare  the  methods  of  analysis.  For  the  first  two  criteria 
described  below,  the  top  1 0 parents  out  80  were  selected  based  on  their  predicted  or  true 
GCA's,  according  to  the  criterion  considered.  For  methods  that  allow  two  predictions  per 
parent  (methods  III,  VII,  VIII,  IX),  the  criteria  were  applied  to  each  individually.  For  these 
methods,  selection  was  based  on  each  set  of  GCA’s,  one  for  each  site.  The  first  criterion  was 
the  bias  in  predicted  genetic  gain  given  by  the  ratio  of  predicted  over  true  genetic  gains  for 
the  top  10  out  of  80  parents  (PT  ratio).  This  ratio  was  computed  by  dividing  the  average  of 
the  predicted  GCA’s  by  the  corresponding  average  of  the  true  GCA’s  of  the  top  10  parents 
out  of  80  available,  selected  based  on  the  predicted  values.  Ideally  this  ratio  should  be  1 , or 
1 00  when  expressed  in  percent,  which  occurs  when  the  predicted  genetic  gain  is  the  same  as 
the  true,  that  is,  the  true  gain  is  accurately  predicted. 

The  second  criterion,  called  "Efficiency  of  Selection"  (ES  ratio)  by  Meijering  and 
Gianola  (1985),  was  the  ratio  of  the  true  genetic  gain  achieved  from  selection  using  one  of 
the  methods  of  prediction,  relative  to  the  maximum  gain  that  could  be  achieved  if  selection 
is  based  directly  on  the  true  GCA’s.  This  was  computed  as  the  ratio  of  the  average  of  the  true 
breeding  values  of  the  top  1 0 parents  selected  based  on  one  of  the  methods  of  prediction  over 
the  average  of  the  true  breeding  values  of  the  top  1 0 parents  selected  directly  on  the  true 
breeding  values.  This  ratio  reflects  how  much  of  the  maximum  gain  is  realized  when  one  of 
the  methods  of  prediction  is  used  to  make  selection.  Ideally,  this  should  be  1 or  1 00%, 
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meaning  that  the  maximum  genetic  gain  is  being  achieved  by  selection  when  the  true  top 
parents  are  selected. 

The  third  criterion  was  the  Pearson  correlation  between  predicted  and  true  GCA's. 
Ideally  this  should  be  one,  indicating  that  the  predicted  GCA's  rank  the  same  as  the  true 
GCA's. 

F or  methods  III,  VII,  VIII,  and  IX,  that  provide  two  GCA  predictions  for  each  parent, 
selection  was  done  in  each  environment  separately.  In  order  to  distinguish  these  two 
predictions  the  values  1 and  2 were  added  to  the  name  of  the  method.  For  example,  VII- 1 
refers  to  the  prediction  for  environment  1 using  bivariate  BLUP  on  0/1  data. 

Results  and  Discussion 

First  Stage  Analysis 

Overall,  the  methods  perform  more  poorly  when  the  pair  of  environments  moves 
from  50%:  50%  incidences  to  situations  in  where  one  of  the  environments  has  very  high  or 
very  low  incidences  of  the  undesirable  phenotype  (Table  4-1).  It  should  be  pointed  out  that 
both  Best  Linear  Prediction  and  Best  Linear  Unbiased  Prediction  assume  linearity  between 
the  general  combining  abilities  (GCA)  from  the  two  environments.  When  this  assumption 
is  satisfied,  information  between  tests  is  efficiently  translated  with  linear  covariances. 
However,  the  GCA’s  for  binary  data  on  pairs  of  tests  with  different  mean  incidences  are  not 
linearly  related  (F  igure  4- 1 ),  making  the  translation  of  information  by  linear  covariances  less 
efficient.  As  shown  in  Figure  4-1,  the  farther  apart  the  incidences  in  the  two  environments. 
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Table  4-1.  Predicted  genetic  gain  bias  (PT)  and  efficiency  of  selection  (ES)  when  selecting 
10  parents  out  80;  and  Pearson  correlation  (r)  between  predicted  and  true  GCA’s,  for  nine 
analytical  methods  and  three  pairs  of  incidences  of  the  undesirable  phenotype  in  two 
environments. 


Method  ® 

Env.  1: 

50% 

Env  2:  50% 

Env.  1: 

10% 

Env  2:  50% 

Env.  1: 

90% 

Env  2:  50% 

Target 

(%) 

PT 

(%) 

ES 

{%) 

r 

Target 

(%) 

PT 

(%) 

ES 

(%) 

r 

Target 

(%) 

PT 

(%) 

ES 

(%) 

r 

I 

50 

98 

92 

0.92 

50 

96 

89 

0.89 

50 

96 

90 

0.89 

II 

50 

73 

92 

0.91 

50 

65 

89 

0.89 

50 

73 

90 

0.89 

III-l 

50 

94 

87 

0.85 

10 

99 

82 

0.78 

90 

96 

81 

0.78 

III -2 

50 

94 

87 

0.85 

50 

94 

86 

0.85 

50 

94 

86 

0.84 

IV 

50 

94 

92 

0.92 

30 

69 

91 

0.89 

70 

64 

90 

0.89 

V 

50 

95 

92 

0.92 

30 

69 

91 

0.89 

70 

64 

90 

0.89 

VI 

50 

145 

89 

0.90 

50 

125 

84 

0.85 

50 

126 

87 

0.85 

VII-  1 

50 

97 

91 

0.91 

10 

120 

91 

0.87 

90 

93 

88 

0.87 

VII -2 

50 

97 

91 

0.91 

50 

89 

89 

0.89 

50 

102 

90 

0.89 

VIII  - 1 

50 

97 

91 

0.91 

10 

120 

91 

0.87 

90 

93 

88 

0.87 

VIII  - 2 

50 

97 

91 

0.91 

50 

89 

89 

0.89 

50 

102 

90 

0.89 

IX-  1 

50 

168 

89 

0.90 

10 

408 

89 

0.83 

90 

277 

84 

0.83 

IX -2 

50 

168 

89 

0.90 

50 

166 

87 

0.87 

50 

169 

87 

0.87 

^ Methods:  I - Univariate  GLIMM;  II  - BLP  method;  III  - Univariate  BLUP  on  binary  data  with  analysis 
per  site,  and  predictions  from  environments  1 (III-l)  and  2 (III-2));  IV  - Univariate  BLUP  combining  binary 
data  across  sites;  V - Univariate  BLUP  combining  plot  means  across  sites;  VI  - Univariate  BLUP 


combining  probit  transformed  data  across  sites  ; VII  - Bivariate  BLUP  on  Binary  data,  with  predictions 
from  environments  1 (III-l)  and  2 (III-2);  VIII-  Bivariate  BLUP  on  plot  means,  with  predictions  from 
environments  1 (III-l)  and  2 (III-2);  and  IX  - Bivariate  BLUP  on  probit  transformed  data,  with  predictions 
from  environments  1 (III-l).  and  2 (III-2). 


b/  Incidence  of  the  undesirable  phenotype  in  the  target  environment 
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GCA  underlying  GCA  50%  (%) 


Figure  4-1 . True  general  combining  abilities  (GCA's)  on  the  underlying  scale  and 
on  the  binary  scale,  showing  nonlinearity  between  GCA’s  evaluated  by  equation 
4-2,  for  several  values  of  gyj,  at  10, 30,  50,  70,  and  90%  incidences. 
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larger  the  nonlinearity.  For  example,  plots  of  GCA’s  in  environments  with  10  and  90% 
incidences  are  less  linear  than  plots  of  GCA’s  from  environments  with  10  and  30%. 
Therefore,  linear  methods  would  be  expected  to  perform  more  poorly  for  the  first  case. 

Methods  IV  and  V (imivariate  BLUP  across  sites),  followed  by  methods  I (GLIMM), 
II  (BLP),  VII  and  VIII  (Bivariate  BLUP)  presented  the  highest  efficiency  of  selection  (Table 
4-1).  On  the  other  hand,  BLUP  based  on  single  site  analysis  (III)  and  on  probit  transformed 
data  (VII,  IX)  presented  the  lowest  efficiency.  The  same  trend  was  observed  when  the 
correlation  between  true  and  predicted  GCA's  was  used  as  criterion. 

When  the  criterion  bias  on  the  genetic  gain  (PT)  is  considered,  overall  the  methods 
underpredict  the  gain  realized  with  selection  (PT  < 1 00%),  except  methods  based  on  probit 
transformation  (VI,  IX)  and  bivariate  BLUP  when  one  of  the  locations  has  low  incidence 
(Table  4-1).  Based  on  this  criterion,  methods  I (GLIMM)  and  III  (Univariate  BLUP  with 
single  site  analysis)  outperform  the  other  methods.  Method  I,  also  presented  high  efficiency 
of  selection,  high  correlation  between  true  and  predicted  GCA's,  and  has  the  additional 
advantage  of  easily  predicting  genetic  merit  for  any  target,  even  when  the  data  from  that 
environment  is  not  included  in  the  analysis. 

Methods  VI  (univariate  BLUP)  and  IX  (bivariate  BLUP),  based  on  probit 
transformation,  overpredict  the  true  gain  realized  from  selection  for  the  target  environment 
(PT  > 100%,  Table  4-1).  Also,  the  gain  achieved  with  these  two  methods,  relative  to  the 
maximum  gain,  is  smaller  than  the  gain  achieved  by  other  methods  of  prediction  (smaller 
ES),  except  method  III  (univariate  BLUP  analysis  per  each  environment  separately). 
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Methods  II  (BLP),  IV,  and  V underpredict  the  gain  realized  from  selection  (PT  < 
1 00%,  Tables  4- 1 ).  However,  the  gain  realized  by  selection  on  the  predictions  based  on  these 
methods  is  close  to  the  maximum  gain  and  the  correlation  between  predicted  and  true  GC  A’ s 
is  high.  Therefore,  despite  the  fact  these  methods  underpredict  the  realized  gain,  they  rank 
the  candidates  well  for  selection. 

Bivariate  methods  (VII,  VIII)  perform  poorly  in  predicting  the  realized  gain,  when 
the  incidence  in  environment  1 is  either  10  or  90%.  At  the  pair  of  incidences  10%:50%, 
methods  VII  and  VIII  underpredict  the  gain  for  environment  2 (with  50%  incidence)  and 
overpredict  for  environment  1 (with  10%).  The  opposite  occurs  when  the  pair  of 
environments  has  incidences  90%:50%. 

For  balanced  data,  as  those  in  this  study,  Euialysis  on  either  0/1  individual  data 
(methods  IV  and  VII)  or  plot  means  (methods  V and  VIII)  produce  equal  results  (Table  4-1). 
The  assumptions  for  these  analyses  are  violated  by  0/1  data  and,  in  many  cases,  the  central 
limit  theorem  is  invoked  to  justify  analysis  based  on  plot  means  assuming  the  violations 
would  be  less  severe.  Based  on  the  results  of  this  study  either  BLUP  is  robust  to  those 
assumptions  or  1 0 individuals  per  plot  is  not  enough  to  have  an  increased  benefit  of  using 
plot  means.  In  theory,  at  least  30  individuals  are  required  for  the  distribution  of  the  errors  to 
be  approximated  by  a normal  distribution  (Mendenhall  et  al.  1 990) 

The  correlations  between  predicted  and  true  GCA's  for  bivariate  analysis  (method 
VII)  was  higher  than  that  with  univariate  analysis  on  single  site  (method  III).  Also,  when  the 
incidences  in  the  two  environments  are  different,  a smaller  increase  in  correlation  occurred 
in  the  environment  with  50%  incidence  (Table  4-1).  These  results  agree  with  the  theoretical 
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results  by  Schaeffer  (1984).  According  to  Schaeffer  (1984),  multivariate  analysis  presents 
higher  benefit  (reduction  in  predicted  error  variance  and  then  increase  in  the  correlation  of 
breeding  values)  over  univariate  analysis  when  the  absolute  difference  between  genetic  and 
environmental  correlations  between  the  traits  is  large.  Schaeffer  (1984),  also  showed  that  if 
the  genetic  correlation  is  larger  than  the  environmental,  the  trait  with  lower  heritability  will 
have  greater  benefit  than  that  with  higher  heritability.  In  our  study,  the  genetic  correlation 
between  the  traits  (environments)  on  the  underlying  scale  is  one  and  the  environmental 
correlation  zero.  Also,  from  Dempster  and  Lemer  (1950)  and  from  Figure  2-4  in  Chapter  2, 
the  heritability  for  binary  traits  is  maximum  when  the  incidence  is  50%.  Therefore,  here  the 
genetic  correlation  is  larger  than  the  environmental  and  the  heritability  in  environment  2 (the 
one  with  50%  incidence)  is  always  greater  than  that  in  environment  1. 

Second  Stage  Analysis 

For  this  second  stage,  only  two  methods  of  prediction  were  considered.  Both  are 
based  on  a univariate  analysis  combining  0/1  data  across  environments.  However,  one  uses 
probit  as  a link  function  and  assumes  binomial  errors  (method  I or  GLIMM).  The  other  is 
based  on  normal  theory  (method  IV  or  univariate  BLUP).  For  these  two  methods  the 
correlation  between  true  and  predicted  breeding  values  increases  when  the  incidences  move 
from  extreme  values  (10  or  90%)  to  50%  (Figure  4-2).  The  overall  maximum  occurs  when 
the  incidence  in  the  two  environments  are  50%. 

Both  methods  I and  IV  xmderestimated  the  true  gain  (PT  < 100%,  Figure  4-3). 
However,  method  IV  underestimates  the  true  gain  more  severely  than  method  I,  particularly 
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Figxire  4-2.  Correlation  between  predicted  and  true  general  combining 
abilities  on  the  binary  scale  for  several  combinations  of  incidence  of 
the  undesirable  phenotype  in  two  environments,  for  two  methods  of 
analysis:  (a)  GLIMM  or  method  I;  and  (b)  Univariate  BLUP  on  0/1 
data  or  method  IV. 
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Figure  4-3.  Bias  of  the  predicted  genetic  gain  from  selection  of  the  top  10 
out  80  parents,  PT  ratio,  for  several  combinations  of  incidence  of  the 
undesirable  phenotype  in  two  environments,  for  two  methods  of  analysis: 
(a)  GLIMM  or  method  I;  and  (b)  Univariate  BLUP  on  0/1  data  or  method 
IV.  Each  bar  is  an  average  of  300  simulated  experiments  and  the  vertical 
lines  are  95%  confidence  intervals.  The  incidence  in  the  target  environment 
is  the  average  of  the  incidences  in  the  two  environments. 


82 


when  the  incidences  of  the  two  environments  are  in  two  opposite  extremes,  as  for  example 
in  10%:90%  or  30:70%.  When  the  two  environments  have  50%  incidence,  these  two 
methods  are  equally  accurate.  Considering  only  this  criterion,  method  I is  more  robust  to 
changes  in  incidences. 

The  efficiency  of  selection  for  both  methods  are  higher  at  intermediate  incidences  of 
the  undesirable  phenotype  (Figure  4-4).  However,  while  in  the  univariate  BLUP  the 
efficiency  of  selection  is  almost  the  same  for  very  low  and  very  high  incidences,  GLIMM 
has  higher  efficiency  of  selection  for  very  high  incidence  compared  with  very  low.  When  the 
incidences  in  the  two  environments  are  very  low,  the  selection  efficiency  is  higher  for 
univariate  BLUP. 

The  efficiency  of  selection  from  methods  I and  IV  is  maximized  when  one  of  the 
environments  has  50  or,  more  commonly,  70%  incidence  of  the  undesirable  phenotype,  with 
the  global  maximum  efficiency  when  the  two  environments  have  70%  incidence  (Figure 
4-4). 

Although  estimation  of  heritability  was  not  the  major  objective  of  this  study, 
estimates  were  obtained  on  the  underlying  scale  to  evaluate  the  robustness  of  method  I across 
several  combinations  of  incidences.  In  Figure  4-5,  method  I is  quite  robust  to  the  change  in 
incidences,  however,  it  consistently  underestimates  the  true  value  (h^  = 0.35). 

Conclusions 


Overall,  considering  the  genetic  architecture  and  incidences  in  this  study,  vmivariate 


BLUP  combining  either  0/1  or  plot  means  data  from  two  environments  gave  the  highest 
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Figure  4-4.  Efficiency  of  selection  from  selecting  the  top  10  out  80  parents, 
ES,  for  several  combinations  of  incidence  of  the  undesirable  phenotype  in 
two  environments,  for  two  methods  of  analysis:  (a)  GLIMM  or  method  I;  and 
(b)  BLUP  on  0/1  data  or  method  IV.  Each  bar  is  an  average  of  300  simulated 
experiments  and  the  vertical  lines  are  95%  confidence  intervals.  The 
incidence  in  the  target  environment  is  the  average  of  the  incidences  in  the  two 
environments. 
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Figure  4-5.  Heritabilities  on  the  underlying  scale  estimated  by  GLIMM,  for  several 
combinations  of  incidence  of  the  desirable  phenotype  in  two  environments.  Each  bar 
is  an  average  of 300  simulated  experiments  and  the  vertical  lines  are  95%  confidence 
intervals.  The  dashed  line  is  the  true  heritability. 


85 


efficiency  of  selection  (realized-maximum  gain  ratio)  and  the  highest  correlation  between 
true  and  predicted  general  combining  abilities.  However,  these  methods  severely 
underpredict  the  realized  gain  when  the  incidences  in  the  two  environments  are  in  two 
opposite  extremes,  as  for  example  in  10%:90%  or  30%:70%. 

The  BLP  method  using  a function  to  model  the  variances  and  covariances  (method 
II)  also  vmderpredicts  the  gain  realized  by  selection.  However,  as  for  xmivariate  BLUP,  BLP 
method  still  ranks  the  candidates  for  selection  close  to  the  true  rank  and  results  in  true  gain 
close  to  the  maximum 

The  generalized  linear  mixed  model  method  (method  I),  opposite  to  imivariate  BLUP 
and  BLP,  predicts  the  realized  gain  accurately  for  all  pairs  of  incidences  studied  (all 
combinations  of  10,  30,  50,  70,  and  90%  of  the  imdesirable  phenotype).  However,  the 
efficiency  of  selection  for  this  method  is  smaller  than  that  with  Univariate  BLUP  on  0/ 1 data; 
particularly  at  extreme  incidences  of  the  undesirable  phenotype. 

Univariate  and  bivariate  BLUP  on  probit  transformed  data  (methods  VI  and  IX) 
extremely  overpredict  the  gain  realized  by  selection. 

The  efficiency  of  selection,  given  by  the  ratio  between  realized  and  maximum  gains, 
is  maximized  when  one  of  the  environments  has  50  or  70%  incidence  of  the  imdesirable 
phenotype,  with  a global  maximum  when  both  environments  have  70%  incidence. 

A limitation  of  the  present  study  is  that  only  two  environments  were  used  each  time, 
with  all  parents  represented  in  both  environments.  Therefore,  the  results  may  not  apply  for 
situations  which  only  a few  parents  are  common  to  both  environments. 


CHAPTER  5 

RANDOM  AND  SELECT:  PROGRAMS  FOR  THE  SIMULATION  OF 
POPULATIONS  AND  MASS  SELECTION  ON  BINARY  THRESHOLD  TRAITS 

Introduction 

In  many  situations  in  quantitative  genetics,  analytical  methods  do  not  exist  that  allow 
comparison  of  techniques  of  analysis.  In  other  situations,  concepts  can  be  demonstrated  if 
data  with  a known  genetic  architecture  are  available.  In  these  situations,  simulation  that 
mimics  real  progeny  tests  and  other  activities  involved  in  breeding  is  required.  In  the 
literature,  programs  that  simulate  progeny  tests  and  selection  are  not  available,  particularly 
when  a threshold  model  is  assumed. 

The  purpose  of  this  chapter  is  to  describe  two  computer  programs  developed  in  S AS® 
language  (S  AS  Institute,  1 990)  for  simulation  of  populations  in  progeny  (half-sib  families) 
tests  and  mass  selection  on  a binary  threshold  trait.  The  first  program,  RANDOM.  S AS 
(appendix  A),  simulates  data  from  a half-sib  progeny  test  in  several  environments,  blocks 
and  families.  The  second  program,  SELECT.SAS  (appendix  B),  simulates  a complete  cycle 
of  mass  selection  on  a binary  threshold  trait  and  generates  simulated  progenies  of  the 
selected  individuals.  Realized  gain  and  heritabilities  for  both  the  continuous  and  the  binary 
threshold  traits  are  computed. 
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Description  of  RANDOM.SAS 

input  Information 

The  first  step  to  run  RANDOM.SAS  is  to  set  the  number  of  levels  for  each  factor  in 
the  simulated  progeny  tests,  such  as:  number  of  samples  or  progeny  tests  (nsamp),  number 
of  environments  (nenv),  number  of  families  (nfam),  number  of  blocks  (nblk),  and  number 
of  trees  in  each  block  per  each  family  (ntrees).  Also  in  this  step,  we  input  the  name  of  the  file 
where  we  want  the  data  to  be  saved. 

The  second  step  is  to  describe  the  genetic  architecture  of  the  population  we  are 
sampling  from.  Here,  we  enter  the  population  variance  components  associated  with 
environment  or  test  location  (Ve),  blocks  (Vb),  half-sib  family  or  general  combining  ability 
(Vf),  family-by-environment  interaction  (Vfe),  family-by-block  interaction  (Vp),  and  trees 
within  family  and  block  (Vw).  The  population  mean  (mu)  is  also  set  in  this  step. 

Algorithm 

Two  traits  are  simulated  following  the  threshold  model:  an  imderlying  continuous 
trait  and  the  resultant  binary  trait.  The  model  used  to  simulate  the  continuous  trait  is 
y(0)ijkl  “ 1^(0)  + ®(0)i  + b(o)ij  + f(0)k  fC(0)ik  P(0)ijk  W(o)jjk| 


— P(0)  + C(0)i  + b(o)jj  + f(0)k  + fe(0)ik  P(0)ijk  %)ijkl  + V(o)ijk| 


(5-1) 
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where 

y(o)ijki  is  the  underlying  continuous  trait  in  the  base  population  (generation  zero); 

P(o)  is  the  general  mean  in  the  base  population; 

C(o)i  is  the  effect  of  the  i*  environment,  C(o)i  ~ N(0,  Ve),  i = 1,...,  nenv; 

b(o)ij  is  the  effect  of  the  j*  block  in  the  i*  environment,  b(o)ij  ~ N(0,  Vb),  j =1 nblk; 

f(o)k  is  the  effect  of  the  k*  female  general  combining  ability, 
f(o)ic  ~ N(0,l/4Va),  k =l,...,nfam  ; 

fC(o)ik  is  the  effect  of  the  k*  female  geneal  combining  ability  in  the  i*  environment, 
fe(o)ik  ~ N(0,Vfe); 

P(o)ijk  is  the  effect  ijk*  plot,  P(o)ijt  ~ N(0,Vp);  and 

W(o)ijki  is  the  effect  of  the  I***  individual,  in  the  yk*  plot,  1 =1,...,  ntrees, 

W(o)ijki  ~ N(0,  3/4Va  + Ve  = Vw); 

U(o)ijki  is  the  random  effect  associated  with  the  1“*  male  general  combining  ability  and 
the  Mendelian  sampling  of  the  ijkl*  individual,  U(o)yki  ~ N(0,  3/4Va);  and 

V(o)ijki  is  the  error  associated  to  the  ykl*  individual,  V(o)jjki  ~ N(0,  Vv). 

Covariances  among  all  effects  in  model  5-1  are  assumed  to  be  zero.  In  both 
RANDOM.SAS  (appendix  A)  and  SELECT.SAS  (appendix  B)  the  variables  P(o),  C(o)i,  b(o)ij, 
f(o)k’  f®(o)ik»  P(o)ijk>  W(o)ijki,  and  V(o)ijki  are  called  Mu,  en,  bn,  fn,  fen,  pn,  and  in,  respectively. 

The  true  breeding  values  of  the  individuals  in  a given  sample  (progeny  test)  are 
obtained  by  a(o)ijki  = f(o)k  + U(o)ijki,  in  the  base  population.  These  breeding  values  are  retained 
and  used  in  the  generation  of  the  progenies  of  the  selected  individuals,  as  described  in 


equation  5-3. 


89 


In  the  simulation  of  the  random  effects  e(o)i,  b(o)ij>  f(o)ic5  f®(o)ik5  P(o)ijk»"^(o)ijkb  U(o)ijki» 
V(o)ijki,  standard  normal  random  deviates  are  generated  using  the  function  CALL  RANNOR, 
from  SAS®.  The  random  deviates  generated  this  way  do  not  have  a mean  of  zero  and 
variance  of  one,  but  are  close  to  these  values.  Therefore,  before  transforming  the  deviates  to 
the  distribution  required,  they  are  first  standardized  to  have  a mean  and  a variance  of  exactly 
zero  and  one,  respectively.  Finally,  the  deviates  are  transformed  to  have  the  variance  required 
for  that  factor.  For  example  for  the  factor  "environment" 

6(0)i  ~ 6'(0)i  * ■'^Ve 

where  e'(o)j  is  the  effect  of  the  i*  environment  with  mean  zero  and  variance  one,  that  is, 
e'(o)i  ~ N(0,  1),  i = 1,...,  nenv.  The  phenotypic  values  for  the  underlying  continuous  trait, 
y(o)ijki»  generated  by  summing  the  population  mean  and  the  deviates  according  to  the 
model  5-1. 

The  phenotypic  values  for  the  binary  trait,  x^o)ijkiJ  ^re  generated  by  imposing  a 
threshold,  T,  in  the  population  of  y(o)ijki's 

0 if  y(o)ijki  ^ T (desirable  phenotype) 

^(o)ijki  1 1 if  > T (undesirable  phenotype) 

Different  threshold  values  can  be  imposed  in  the  distribution  of  the  continuous  trait 
to  simulate  different  incidences  of  the  undesirable  phenotype  (X(o)ijH  =1),  of  the  binary  trait. 
The  threshold  values  are  obtained  automatically  by  the  program.  For  this,  only  the  incidence 


of  the  trait  is  required. 
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Output 

RANDOM.SAS  outputs  to  a file  the  descriptor  variables  that  identify  the  data 
(sample,  env,  fam,  blk,  trees),  the  continuous  trait  (Y),  the  true  breeding  value  for  the 
continuous  trait  (BVi),  and  the  binary  threshold  trait  (X). 

Description  of  SELECT.SAS 


Input  Information 

SELECT.SAS  requires  the  same  input  information  as  RANDOM.SAS  for  the  number 
of  levels  of  the  different  factors,  variance  components  and  filenames.  Besides  that 
information,  SELECT.SAS  also  requires  a file  created  by  RANDOM.SAS  containing: 
sample  number  (sample),  environment  (env),  family  (fam),  block  (blk),  trees,  continuus  trait 
(Y),  and  the  individual  true  breeding  values  on  underlying  scale  (BVi). 

Algorithm 

From  each  progeny  test  (base  population)  simulated  by  RANDOM.SAS  (appendix 
A),  mass  selection  is  performed  on  the  binary  trait  by  SELECT.SAS  (appendix  B),  selecting 
nfam  uninfected  individuals  (individuals  with  X(o)ijjt  = 0)  per  test  to  advance  to  the  next 
generation.  These  nfam  individuals  are  intermated  by  randomly  crossing  each  female  with 
ntrees  different  males  out  of  the  nfam  available,  generating  ntrees  half-sib  offsprings  per 
cross  (one  progeny  for  each  male  parent).  No  action  is  taken  to  avoid  reciprocal  crosses. 
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The  phenotypic  values  of  the  progenies  of  selected  parents  are  obtained  according  to 
the  model 

y(l)ijkl  ~ ^(1)  C(l)i  + b(i)ij  + (*/2  a(0)kf  + 3(0)km)  f®(l)ik  P(l)ijk  S(i)jjk|  + (5-3) 

where 

a^o)kf  is  the  breeding  value  of  the  female  parent  of  the  ijkl“'  individual  (generation 

zero); 

3(o)km  is  the  breeding  value  of  the  male  parent  of  the  ijld*  individual  (generation  zero); 
S(i)ijki  is  the  random  effect  associated  with  the  Mendelian  sampling  of  the  ijkl* 
individual,  S(i)jjki  ~ N(0,l/2Va);  and 

V(,)ijki  is  the  error  associated  with  the  ijkl*  individual  in  generation  1 , e(i)ijki  ~ N(0,  Ve). 
The  other  factors  are  as  previously  described  for  model  5-1 . 

The  true  breeding  values  a^o>kf  3(o)km  those  from  the  individuals  selected  to 
advance  generation.  The  same  threshold  values  used  in  the  base  population  (equation  5-2) 
are  also  applied  in  the  progeny  of  selected  individuals  to  generate  the  binary  variable  X(i)jjk|. 

Output 

SELECT.  S AS  outputs  the  means  (ppY,  sY,psY,  ppX,  sX,  psX)  and  variances  (vppY, 
vsY,  vpsY,  vppX,  vsX,  vpsX)  of  the  base  (pp),  selected  (s)  and  progeny  of  selected 
populations  (ps),  for  the  binary  (X)  and  underlying  continuous  traits  (Y).  The  program 
outputs  also  the  realized  gains  (GRy,  GRx)  and  realized  heritabilities  (h2y,  h2x)  for  those 


traits. 


APPENDIX  A 

SAS  CODE  FOR  RANDOM.SAS 


? 

* Program  : RANDOM.SAS  *; 

* Objective:  Generates  continuous  and  binary  data  according  to  the  threshold  *; 

* model  * ; 

sfe  He . 

? 

* Date:  Aug/96  *; 

* Version:  July  5, 1998  *; 


proc  datasets  library=work  KILL;  run;*  clean  files  before  starting; 


* Information 

data  aO; 

%Let  incid  =05;* 

%Let  nsamp=300;  * 
%Letnenv  =1;* 

%Letnfam  =80;* 
%Letnblk  =1;* 

%Let  ntrees=24;* 

%Let  dta=CH  lA26.dat;* 
%Let  VC=%STR(  Ve  = 0;* 

Vb  = 0; 


Incidence  of  the  trait  (Use  two  digits:  e.g.  01, 05,  75%); 
Number  of  samples  or  progeny  tests; 

Number  of  environments  or  test  locations; 

Number  of  half-sib  families; 

Number  of  blocks; 

Number  of  trees  per  family  per  block; 

Name  of  the  file  to  have  the  data  saved; 

True  variance  components; 


Vf  = 0.75405; 
Vfe=  0; 


Vp  = 0; 
Vw  = 9.3; 
Mu=0;); 


* Generates  deviates  according  to  a standard  normal  distribution 

data  al  a2  a3; 

seed  =int(time());*  Use  the  current  time  as  seed; 
put  seed=; 

do  Sample  = 1 to  &nsamp; 
do  fam=l  to  &nfam; 
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call  rannor(seed,f); 
output  al; 
end; 

do  env=  1 to  &nenv; 
call  rannor(seed,e); 
do  fam=l  to  &nfam; 
call  rannor(seed,fe); 
output  a2; 
end; 

do  blk=l  to  &nblk; 
call  rannor(seed,b); 
do  fam=l  to  &nfam; 
call  rannor(seed,p); 
do  trees=l  to  &ntrees; 
call  rannor(seed,w); 
call  rannor(seed,i); 
output  a3 ; 
end; 
end; 
end; 
end; 
end; 
run; 

data  al;  set  al  (keep=Sample  fam  f); 
proc  sort  data=al ; 
by  Sample  fam;  run; 

data  a2;  set  a2(keep=Sample  env  fam  e fe); 
proc  sort  data=a2; 
by  Sample  fam;  run; 

data  a3;  set  a3(keep=Sample  env  blk  fam  trees  b p w i); 
proc  sort  data=a3 ; 

by  Sample  env  fam;  run; 
data  a4; 

merge  al(in=TU)  a2; 
ifTU; 

by  Sample  fam; 
run; 

proc  sort  data=a4; 

by  Sample  env  fam;  run; 
proc  datasets  library=work; 

delete  al  a2;  run; 
data  al; 

merge  a4(in=TU)  a3; 


by  Sample  env  fam; 
ifTU; 
ii=l; 
dev  = w; 
run; 

proc  datasets  library=work; 
delete  a3  a4;  run; 

* Standardize  deviates  to  exactly  N(0,1) 

proc  sort  data=al; 

by  sample  fam; 
proc  means  noprint  data=al ; 
by  sample  fam; 
var  f; 

output  out=a2  mean=dev;  run; 
proc  sort  data=al ; 

by  sample  env  fam; 
proc  means  noprint  data=al; 
by  sample  env  fam; 
var  fe; 

output  out=a3  mean=dev;  run; 
data  a4;  set  al(keep=i); 
dev=i; 
run; 

data  a5;  set  al  a2  a3  a4  (keep=dev); 
ii=l; 

proc  means  noprint  data=a5; 
by  ii; 
var  dev; 

output  out=a6  n==nn  mean=Ave  var=Vari;  run; 
proc  print  data=a6; 

var  nn  Ave  Vari;  run; 
proc  datasets  library=work; 
delete  a2  a3  a4  a5; 
run; 
data  a4; 

merge  al  (In=TU)  a6;  by  ii; 

If  TU; 

e = (e  - Ave)/sqrt(vari); 
b = (b  - Ave)/sqrt(vari); 
f = (f  - Ave)/sqrt(vari); 
fe  = (fe  - Ave)/sqrt(vari); 
p = (p  - Ave)/sqrt(vari); 
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w = (w  - Ave)/sqrt(vari); 
i = (i  - Ave)/sqrt(vari); 

*  Transform  deviates  to  N(Mu,  Sdy**2) ; 

&VC 

SDy  = sqrt(Ve  + Vb  + Vf  + Vfe  + Vp  + Vw); 

Vw  = Vw-  3*Vf; 

Vi  = 3*Vf; 
en=  e*sqrt(Ve); 
bn=  b*sqrt(Vb); 
fn=  f*sqrt(Vf); 
fen=  fe*sqrt(Vfe); 
pn=  p*sqrt(Vp); 
wn=  w*sqrt(Vw); 
in=  i*sqrt(Vi); 

Y = Mu  + en  + bn  + fn  + fen  + pn  + wn  + in; 

BVi  = fn  + in; 

BVf=2*fn; 

Thresh  = M + probit(l-  0.&incid)*SDy; 
if  Y > Thresh  then  X=1 ; else  X=0; 

drop  Ve  Vb  VfVfe  Vp  VwVi  een  b bn  ffnfe  fen  ppn  wwn  i in  seed  Ve  Vb  Vf 
Vfe  Vp  Vw  Vi  Mu  Sdy; 

run; 

proc  datasets  library=work; 
delete  aO  al ; run; 

* Saves  data  to  be  used  by  SELECT.sas ; 

data  a5;  set  a4(keep  = Sample  env  fam  blk  trees  y BVi  X); 

file  "&dta"; 

put  sample  1-4  env  5-6  fam  7-10  blk  11-12  trees  13-14  y 15-25  .5+1  BVi  26-36  .5 
X 38-39; 
run; 

proc  datasets  library=work  KILL; 
run; 


APPENDIX  B 

SAS  CODE  FOR  SELECT.SAS 


? 

* Program  : SELECT.sas.  *; 

* Objective:  simulate  cycles  of  mass  selection  on  a binary  threshold  trait  * 

* Date:  Aug/96  *; 

* Version:  July  19, 1998  *; 

Hc)|cHc3ie3ie3ic3ic3ic)ic%3|(3ic9ie%3icH(HcH<9ic4(3ic:|c9H3|e:|c:ic3|c)ic}ic}is:|c}|c}|c^:|c^c^c:|c)|«9|c9i(^Hc3icH(H(Hc3i(3|cH<3k3i<3ic9|c9|c}ic9ic>kH<3ic)ie3ie9ie9ic9icH(3i()k)i<^9HH(>ic* 

? 

options  ps=100  ls=80; 

proc  datasets  library=work  KILL;  run;*  Clean  files  before  starting; 


* Information 

data  aO; 

%Let  incid  = 05;* 
%Let  Nsamp  = 300  ;* 
%LetNenv  =1;* 
%Let  Nfam  = 80;* 
%LetNblk  =1;* 
%Let  Ntrees=  24;* 


Incidence  of  the  trait  (Use  two  digits:  e.g.  01,  05,  75%); 
Number  of  samples  or  progeny  tests; 

Number  of  environments  or  test  locations; 

Number  of  half-sib  families; 

Number  of  blocks; 

Number  of  trees  per  family  per  block; 

%Let  dta  = CHlA26.dat;*  Name  of  the  file  to  read  the  data  from; 

%Let  size  = %eval(&Nfam*&Nblk*&Ntrees);*  experiment  size  = nfam**nblk*ntrees; 
%Let  VC=%STR(  Ve  = 0;*  True  variance  components; 

Vb  = 0; 

Vf=  0.24180; 

Vfe=  0.13010; 


Vp  = 0; 

Vw  = 9.3000; 
M=0; 


SDy=sqrt(Ve  + Vb  + Vf  + Vfe  + Vp  + Vw);); 
title  1 "Mass  Selection  on  Binary  Threshold  Traits"; 
title2  "Incidence:  &incid%  Architecture:  &dta"; 


run; 

options  ps=1000  ls=80  nodate; 
data  al; 

Infile  "&dta"; 

Input  sample  1-4  env  5-6  fam  7-10  blk  11-12  trees  13-14  y 15-25  .5+1  BVi  26-36  .5 
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X 38-39; 


run; 


* Base  Population 

proc  sort  data=al ; 

by  sample  env; 
proc  means  noprint  data=al ; 
by  sample  env; 
var  y BVi  X; 

output  out=bbO  mean=  ppy  ppBVi  ppX 

var  = vppy  vppBVi  vppX;*  Mean  & var  of  base  population; 

run; 

* Mass  Selection 

data  a3;  set  al; 

U = ranuni(-l  1);*  SEED=Clock; 
proc  sort  data=a3 ; 

by  Sample  env  XU;*  Selection  on  0/1 ; 

* by  Sample  env  U ;*  NO  Selection; 

* by  Sample  env  y;*  Selection  on  the  continuous  trait; 

mn; 

proc  datasets  library=work; 

delete  al;  run; 
data  a2; 

do  samp  = 1 to  &Nsamp; 
do  env  = 1 to  &Nenv; 
do  fam  = 1 to  «&size; 

output; 
end; 
end; 
end; 
data  a4; 

merge  a2  (in=TU)  a3;  if  TU; 

if  X=1  or  fam  > &nfam  then  delete;*  Selection  on  0/1 ; 

* if  fam  > &nfam  then  delete;*  No  selection ; 

drop  samp  U ; 

run; 

proc  sort  data=a4; 

by  sample  env  fam; 
proc  print;  run; 
proc  datasets  library=work; 

delete  a2  a3 ; run; 
proc  sort  data=a4; 
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by  sample  env; 
proc  means  noprint  data=a4; 
by  sample  env; 
var  Y X BVi; 

output  out  = bbl  n=nfam2  mean=  sY  sX  sBVi 

var  =vsY  vsX  vsBVi;*  mean  of  selected  indivs; 

run; 

data  cc7;  set  bbl  ;*  just  to  provide  a clean  nfam2  for  loops; 

drop  sY  sX  sBVi  vsY  vsX  vsBVi  _TYPE FREQj 

run; 

* Obtain  Progeny  of  Selected  individuals ; 

data  d2;  set  a4; 
male=  fam; 

BVf=BVi; 

BVm  = BVi; 
drop  X Y; 
run; 

data  d3 ; 

do  samp  = 1 to  &nsamp; 
do  env  = 1 to  &nenv; 
set  cc7  point=samp; 
do  fam  = 1 to  nfam2; 

do  i = 1 to  60  ;*  generates  more  males  than  required,  to  reduce  to  the  correct  value; 
AAA: 

U=ranuni(-1 1); 

male  = int((&nfam  + 0.5)*U);*  generates  id  for  male; 
if  male=fam  then  goto  AAA; 
if  male  LT  1 OR  male  GT  &nfam  then  goto  AAA; 
output; 
end; 
end; 
end; 
end; 
stop; 
run; 

data  d4;  set  d2; 

drop  fam  BVf; 
proc  sort  data=d4; 
by  sample  env  male; 
run; 

proc  sort  data=d3; 
by  sample  env  male; 
run; 


datadl; 

merge  d3  (in=TU)  d4;  If  TU; 
by  sample  env  male; 
drop  U i; 
run; 

proc  datasets  library=work; 

delete  d3  d4;  run; 
proc  sort  data=dl; 
by  sample  env  fam; 
run; 

data  d4;set  d2; 
drop  male  BVm; 
run; 

proc  sort  data=d4; 
by  sample  env  fam; 
run; 
data  dO; 

merge  dl  (in=TU)  d4;  If  TU; 
by  sample  env  fam; 
drop  BVi  blk; 
run; 

proc  datasets  library=work; 
delete  dl  d4;  run; 

proc  sort  data=dO(keep=sample  env  fam  male  BVf  BVm)  out=d5  noduplicates; 
by  sample  env  fam  male; 
run; 

proc  rank  data  = d5  out  = d6; 
by  sample  env  fam; 
var  male; 
ranks  trees; 
run; 

data  d7;  set  d6; 

if  trees  > («&nblk  * &ntrees)  then  delete; 

f=0.5*BVf+0.5*BVm; 

drop  BVf  BVm; 

run; 

* checking  if  a female  was  crossed  with  a particular  male  only  once; 
proc  sort  data=d7; 

by  sample  fam  male; 
proc  means  noprint  data=d7; 
by  sample  fam  male; 
var  f; 

output  out=d8  n=ntimes; 


run; 

proc  sort  data=d8; 

by  sample  ntimes; 
proc  means  noprint  data=d8; 
by  sample  ntimes; 
output  out=d8  n=rm; 
run; 

proc  datasets  library=work; 
delete  d5  d6  d8  a2  a4  c 1 ; 
run; 

data  a2  a3; 
seed  =int(time()); 
do  samp  = 1 to  &nsamp; 
do  env=  1 to  &nenv; 

* call  rannor(seed,e); 
set  cc7  point=samp; 
do  fam=l  to  nfam2; 

call  rannor(seed,fe); 
output  a2; 
end; 

do  blk=l  to  &nblk; 

* call  rannor(seed,b); 
do  fam=l  to  nfam2; 

* call  rannor(seed,p); 
do  trees=l  to  &ntrees; 

call  rannor(seed,w); 
output  a3 ; 
end; 
end; 
end; 
end; 
end; 
stop; 
run; 

data  d7;  set  d7  (keep=Sample  env  fam  trees  f); 
proc  sort  data=d7; 

by  Sample  env  fam  trees;  run; 
data  a2;  set  a2(keep=Sample  env  fam  fe); 
proc  sort  data=a2; 
by  Sample  fam;  run; 

data  a3;  set  a3(keep=Sample  env  blk  fam  trees  w); 
proc  sort  data=a3 ; 

by  Sample  env  fam;  run; 
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data  a4; 

merge  a2(in=TU)  d7 ; 
by  Sample  env  fam; 
ifTU; 

proc  sort  data=a4; 

by  Sample  env  fam  trees; 
run; 

proc  datasets  library=work; 
delete  a2  d7 ; 
run; 
data  al ; 

merge  a3(in=TU)  a4;  ifTU; 
by  Sample  env  fam  trees; 
ii=l; 
dev  = w; 
run; 

proc  datasets  library=work; 
delete  a3  a4; 
run; 

proc  sort  data=al ; 

by  sample  env  fam; 
proc  means  noprint  data=al; 
by  sample  env  fam; 
var  fe; 

output  out=a2  mean=dev; 
run; 

data  a3;  set  al  a2  (keep=dev); 
ii=l; 

proc  means  noprint  data=a3; 

byii; 

var  dev; 

output  out=a4  n=nn  mean=Ave  var=Vari; 
run; 

proc  print  data=a4; 
var  nn  Ave  Vari; 
run; 

proc  datasets  library=work; 

delete  a2  a3 ; run; 
data  a2; 

merge  al  (in=TU)  a4; 

byii; 

ifTU; 

e = (e  - Ave  )/sqrt(vari); 
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b = (b  - Ave  )/sqrt(vari); 
fe=  (fe  - Ave)/sqrt(vari); 
p = (p  - Ave  )/sqrt(vari); 
w = (w  - Ave  )/sqrt(vari); 

&VC 

* e = el*sqrt(Ve); 

*b  = bl*sqrt(Vb); 

f=f; 

fe=  fe*sqrt(Vfe); 

* p = p*sqrt(Vp); 

w=  w*sqrt(Vw  - 3*Vf  + 2*Vf); 

Y = M + f+fe  + w; 

*Y  = M + e + b + f+fe  + p + w; 

if  y > (M  + probit(l  - 0.&incid)*SDy)  then  X = 1 ; else  X = 0; 

drop  vsY  vsX  vsBVi  _TYPE ^FREQ_  Ave  Vari  dev; 

run; 

proc  datasets  library=work; 
delete  al  dO  d2  a4; 
run; 

proc  means  noprint  data=a2;*  Just  to  check  distributional  properties  of  factors; 
var  Y M f fe  w; 

output  out=a3  mean=  mY  mM  mf  mfe  mw 

var=  vY  vM  vf  vfe  vw; 

run; 

proc  print  data=a3 ; 

varmymMmfmfe  mw  vyvMvfvfe  vw; 
run; 

proc  sort  data=a2; 

by  sample  env; 
proc  means  noprint  data=a2; 
by  sample  env; 
var  y X ; 

output  out=bb2  mean=psy  psX;*  Means  of  the  progeny  of  selected  individuals; 
run; 

proc  datasets  library=work; 
delete  al  a2  a3;  run; 

* Realized  Gains  and  Heritabilities 

data  al ; 

merge  bbO  bbl  bb2; 
by  sample  Env; 

GRy  = psY  - ppY;*  Gain  in  underlying  scale; 
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GRx  = psX  - ppX;* 
h2y  = GRy/SDnwy;* 
h2x  = GRx/SDnwx;* 
drop  _TYPE FREQ 


Gain  in  0/1  scale; 

Heritability  in  underlying  scale/NON Weighted  SelDiff; 
Heritability  in  0/1  scale/NONweighted  seldiff; 
z blk  nfam2  perX; 


run; 

proc  datasets  library=work; 

delete  bbO  bb2  bb4;  run; 
proc  means  noprint  data=al; 

var  ppY  sY  psY  ppX  sX  psX  GRy  GRx  h2y  h2x; 
output  out=a6  n=n 

mean=  ppY  sY  psY  ppX  sX  psX  GRy  GRx  h2y  h2x 

var  = vppY  vsY  vpsY  vppX  vsX  vpsX  vGRy  vGRx  vh2y  vh2x; 

nm; 

proc  print  data=a6  NOOBS; 
var  ppY  sY  psY  ppX  sX  psX; 

titles  "Means  of  Base  (pp),  Selected  (s),  and  Progeny  of  Selected  Populations  (ps)"; 


run; 

proc  print  data=a6  NOOBS; 

var  vppY  vsY  vpsY  vppX  vsX  vpsX; 

titles  "Variance  of  Base  (pp),  Selected  (s),  and  Progeny  of  Selected  Populations  (ps)"; 
run; 

proc  print  data=a6  NOOBS; 

var  GRy  vGRy  GRx  vGRx  h2y  vh2y  h2x  vh2x; 
titles  "Mean  and  Variance  of  Gains  and  Heritabilities"; 


run; 


* Count  number  of  individuals  selected  per  sample ; 

proc  sort  data=bb  1 ; 
by  nfam2; 

proc  means  noprint  data=bbl; 
by  nfam2; 

output  out  = bbl  n=  ffeq; 
run; 

data  bbl;  set  bbl; 

MSG="  "; 

if  nfam2  < &nfam  then  MSG="Fewer  indivs  selected  than  desired"; 
proc  print  data=bbl  NOOBS; 
var  nfam2  ffeq  MSG; 

title2  "Number  of  Individuals  Selected  (nfam2)  and  Freq  of  Samples'; 
run; 

proc  datasets  library=work  KILL;  run; 
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